[Xapian-discuss] faceted searches

Alexander Lind malte at webstay.org
Mon Aug 27 22:32:03 BST 2007


> I (and Olly) have actually been working on exactly this - there is 
> support for doing this in SVN HEAD, but we may still tweak the API for 
> it a bit more before the next release.
Wonderful!
>
> Basically, we've added a "matchspy" interface, which works very like a 
> match decider, except that Xapian guarantees to call the matchspy for 
> every document which matches the query and is "considered for the 
> MSet" (various optimisations mean that this guarantee is not provided 
> for MatchDeciders - in future, we plan to call them as late as 
> possible in the match process, so they won't see quite a few of the 
> matching documents).  Note that not all the matching documents will 
> always be "considered for the MSet" - this is a little complicated, 
> but basically you can guarantee that at least as many documents as are 
> specified in the "checkatleast" parameter to get_mset() will be 
> considered (if there actually that many matching documents), but more 
> documents may also be considered.  Thus, a matchspy can easily be set 
> to be called on a much larger number of matches than are returned in 
> the mset, but a limit on the number of matches passed to the matchspy 
> also applies, to avoid slow processing for very large result sets.

Sounds brilliant. I did see the new matchspy addition in the API docs 
the other day, and it seemed like the beginnings of a facetting feature, 
but I didn't make the connection entirely.
Will the matchspy function be portable to the bindings, PHP in particular?
>
> We also provide a couple of standard MatchSpy implementations (defined 
> in the new header file "matchspy.h").  One of these counts up the 
> occurrences of values in specified slots in the documents which are 
> presented to it: if the facets are stored in the value slots, this 
> gives the matching facets.

Would this traverse all values in each document it processes, or just 
look at specific positions given to it?
>
> We've also implemented some code such that a facet can contain 
> arbitrary numbers (serialised to strings in an appropriate way), and 
> some code which will pick appropriate ranges to use to divide these 
> numbers into a managable number of groups.  This can be used to allow 
> a numerical value (eg, price) to be used as a facet.

Fantastic.
Will it also be possible to facet on indexed keywords (COLORblue eg.)?  
Or will it be better to assign each color a value and store it in its 
own position in the value list?

Any idea of when we might see Xapian 1.0.3 in the wild? :)

Cheers
Alec



More information about the Xapian-discuss mailing list