[Xapian-discuss] Matchspy and faceting
Andrew Betts
andrew.betts at assanka.net
Sat Aug 28 16:22:30 BST 2010
Have been working recently on a site that classifies posts using tags in
taxonomies, so a post about the Oil spill in the Gulf of Mexico might be
tagged 'Subscribers only' (access level), 'Barack Obama' (person), 'Tony
Hayward' (person), 'BP' (company), 'Transocean' (company), 'Gulf of
Mexico' (location).
With some advice from Richard Boulton I started looking at using the
Matchspy branch for calculating facet suggestions. I have used the
stringlistserialiser to place a list of all the tags in each taxonomy in
value slots, one per taxonomy, and then a multivaluecountmatchspy on the
search side.
However, I have a few questions.
1.I thought the method on the MVCMS was get_top_values but the class
seems to have the methods top_values_begin and top_values_end instead,
which do seem to work though I'm unsure what the arguments are. Can you
confirm this is right, and since I only found this by reading the source
code, is there any matchspy documentation?
2. It seems like you have to fetch an mset after attaching the matchspy
(even if you don't need one) before the matchspy will return results.
Is that right?
3. Can you attach multiple matchspys to the same enquire? I've tried to
do this but it seems to segfault somewhere on or after the third, so
currently I'm having to set up a new enquire for each matchspy (one per
taxonomy) which must be running the query again I'd have thought.
4. What's the best strategy for achieving ideal facet suggestions? In
some cases, where two or more tags from the same taxonomy are regularly
used on the same post, eg people or companies, it may make sense to
suggest tags from a matchspy on the query that is running, so if your
query includes 'barack obama' for example, you get facet suggestions in
the people taxonomy that frequently co-occur with Obama. It would seem
to make sense to AND these with Obama if they are chosen. However,
there are other taxonomies in which tags are all mutually exclusive, eg
'access level', where a post can only have one access level. The logic
above would therefore not produce any suggestions beyond the one you've
already queried on. So it might be better to produce suggestions for
each taxonomy based on a query that excludes any selected tags in that
taxonomy but applies filters on tags selected in other taxonomies. I'm
interested in whether anyone's looked at this area of faceting in detail
and had this problem before.
Best wishes,
Andrew
More information about the Xapian-discuss
mailing list