[Xapian-discuss] faceted searches
Alexander Lind
malte at webstay.org
Mon Aug 27 22:32:03 BST 2007
> I (and Olly) have actually been working on exactly this - there is
> support for doing this in SVN HEAD, but we may still tweak the API for
> it a bit more before the next release.
Wonderful!
>
> Basically, we've added a "matchspy" interface, which works very like a
> match decider, except that Xapian guarantees to call the matchspy for
> every document which matches the query and is "considered for the
> MSet" (various optimisations mean that this guarantee is not provided
> for MatchDeciders - in future, we plan to call them as late as
> possible in the match process, so they won't see quite a few of the
> matching documents). Note that not all the matching documents will
> always be "considered for the MSet" - this is a little complicated,
> but basically you can guarantee that at least as many documents as are
> specified in the "checkatleast" parameter to get_mset() will be
> considered (if there actually that many matching documents), but more
> documents may also be considered. Thus, a matchspy can easily be set
> to be called on a much larger number of matches than are returned in
> the mset, but a limit on the number of matches passed to the matchspy
> also applies, to avoid slow processing for very large result sets.
Sounds brilliant. I did see the new matchspy addition in the API docs
the other day, and it seemed like the beginnings of a facetting feature,
but I didn't make the connection entirely.
Will the matchspy function be portable to the bindings, PHP in particular?
>
> We also provide a couple of standard MatchSpy implementations (defined
> in the new header file "matchspy.h"). One of these counts up the
> occurrences of values in specified slots in the documents which are
> presented to it: if the facets are stored in the value slots, this
> gives the matching facets.
Would this traverse all values in each document it processes, or just
look at specific positions given to it?
>
> We've also implemented some code such that a facet can contain
> arbitrary numbers (serialised to strings in an appropriate way), and
> some code which will pick appropriate ranges to use to divide these
> numbers into a managable number of groups. This can be used to allow
> a numerical value (eg, price) to be used as a facet.
Fantastic.
Will it also be possible to facet on indexed keywords (COLORblue eg.)?
Or will it be better to assign each color a value and store it in its
own position in the value list?
Any idea of when we might see Xapian 1.0.3 in the wild? :)
Cheers
Alec
More information about the Xapian-discuss
mailing list