[Xapian-discuss] Matchspy and faceting

Andrew Betts andrew.betts at assanka.net
Sat Aug 28 16:22:30 BST 2010


Have been working recently on a site that classifies posts using tags in 
taxonomies, so a post about the Oil spill in the Gulf of Mexico might be 
tagged 'Subscribers only' (access level), 'Barack Obama' (person), 'Tony 
Hayward' (person), 'BP' (company), 'Transocean' (company), 'Gulf of 
Mexico' (location).

With some advice from Richard Boulton I started looking at using the 
Matchspy branch for calculating facet suggestions.  I have used the 
stringlistserialiser to place a list of all the tags in each taxonomy in 
value slots, one per taxonomy, and then a multivaluecountmatchspy on the 
search side.

However, I have a few questions.

1.I thought the method on the MVCMS was get_top_values but the class 
seems to have the methods top_values_begin and top_values_end instead, 
which do seem to work though I'm unsure what the arguments are.  Can you 
confirm this is right, and since I only found this by reading the source 
code, is there any matchspy documentation?

2. It seems like you have to fetch an mset after attaching the matchspy 
(even if you don't need one) before the matchspy will return results. 
Is that right?

3. Can you attach multiple matchspys to the same enquire?  I've tried to 
do this but it seems to segfault somewhere on or after the third, so 
currently I'm having to set up a new enquire for each matchspy (one per 
taxonomy) which must be running the query again I'd have thought.

4. What's the best strategy for achieving ideal facet suggestions?  In 
some cases, where two or more tags from the same taxonomy are regularly 
used on the same post, eg people or companies, it may make sense to 
suggest tags from a matchspy on the query that is running, so if your 
query includes 'barack obama' for example, you get facet suggestions in 
the people taxonomy that frequently co-occur with Obama.  It would seem 
to make sense to AND these with Obama if they are chosen.  However, 
there are other taxonomies in which tags are all mutually exclusive, eg 
'access level', where a post can only have one access level.  The logic 
above would therefore not produce any suggestions beyond the one you've 
already queried on.  So it might be better to produce suggestions for 
each taxonomy based on a query that excludes any selected tags in that 
taxonomy but applies filters on tags selected in other taxonomies.  I'm 
interested in whether anyone's looked at this area of faceting in detail 
and had this problem before.

Best wishes,

Andrew



More information about the Xapian-discuss mailing list