[Xapian-discuss] faceted searches

Richard Boulton richard at lemurconsulting.com
Mon Aug 27 10:23:53 BST 2007


Alexander Lind wrote:
> I've been thinking about implementing some faceting support in my Xapian 
> classes.
> Has anyone implemented anything like this before?

I (and Olly) have actually been working on exactly this - there is 
support for doing this in SVN HEAD, but we may still tweak the API for 
it a bit more before the next release.

Basically, we've added a "matchspy" interface, which works very like a 
match decider, except that Xapian guarantees to call the matchspy for 
every document which matches the query and is "considered for the MSet" 
(various optimisations mean that this guarantee is not provided for 
MatchDeciders - in future, we plan to call them as late as possible in 
the match process, so they won't see quite a few of the matching 
documents).  Note that not all the matching documents will always be 
"considered for the MSet" - this is a little complicated, but basically 
you can guarantee that at least as many documents as are specified in 
the "checkatleast" parameter to get_mset() will be considered (if there 
actually that many matching documents), but more documents may also be 
considered.  Thus, a matchspy can easily be set to be called on a much 
larger number of matches than are returned in the mset, but a limit on 
the number of matches passed to the matchspy also applies, to avoid slow 
processing for very large result sets.

We also provide a couple of standard MatchSpy implementations (defined 
in the new header file "matchspy.h").  One of these counts up the 
occurrences of values in specified slots in the documents which are 
presented to it: if the facets are stored in the value slots, this gives 
the matching facets.

We've also implemented some code such that a facet can contain arbitrary 
numbers (serialised to strings in an appropriate way), and some code 
which will pick appropriate ranges to use to divide these numbers into a 
managable number of groups.  This can be used to allow a numerical value 
(eg, price) to be used as a facet.

 > It seems like Omega does not support faceted searches, is this correct?

That is (currently) correct.  I don't have plans to add it, but it 
wouldn't be terribly hard to add using the support which will be in 1.0.3.

> What I would like to do specifically, is issue a query like
> "bicycle facet:color"
> 
> Alternative 1:
> 1. Perform main search (for bicycle).
> 2. Perform as many searches on the returned mset as there are facets (in 
> this case all the colors).
> 3. Store the count for each color, and return it together with the 
> search results from point 1.
> 
> Alternative 2:
> 1. Perform main search (for bicycle).
> 2. Iterate through the entire mset, counting occurrences of each color, 
> and storing the counts.
> 3. Return counts and results from point 1.
> 
> Can anyone say if there are any pitfalls with either of these 
> strategies?

Alternative 1 doesn't scale to facets with very many possible values: a 
search needs to be performed for every possible value.

Alternative 2 will almost work, but requires the calculated mset to be 
very large - we normally only calculate a small subset of the mset (eg, 
top 10 documents), but we'd like the counts for the facet values to be 
relevant to all the matching documents, not just the top few matches.

The matchspy approach has neither of these pitfalls.

-- 
Richard



More information about the Xapian-discuss mailing list