[Xapian-discuss] Filter or MatchDecider or Other?
Olly Betts
olly at survex.com
Fri Aug 20 01:44:20 BST 2004
On Thu, Aug 19, 2004 at 06:24:36PM -0400, Mike Boone wrote:
> My existing search tool allows me to limit the search results based on other
> parameters besides the text content. For example, each document has an
> associated category, and I can limit my search to only a certain category.
> Similarly, I can limit the search by a geographic region, or by a date
> range.
>
> I've built a Xapian-database of just the text content, and the simplesearch
> tool seems to return relevant results as expected. What's the best way to
> add the limiting to the search?
>
> It looks like I could maybe add a term of something like
> 'Location:Location1' to the end of my document and then add OP_FILTER to the
> query and require that piece of data.
That's the best way to filter on pre-defined categories. Pick a syntax
for the term which won't clash with terms generated from the text
(including a colon like you suggest is fine; the convention Omega uses
is that terms from text are lower case, so capital letter prefixes are
used for filter terms).
> Or perhaps I could set the catgory as a document value and then
> implement the MatchDecider to limit on that?
That's likely to be less efficient than using OP_FILTER. A document
value is designed to be fast to access, but using a term means the
list of document ids is effectively precalculated.
A MatchDecider is useful when the decision is more complicated - for
example you might want to restrict results to "within X miles of P" which
would be hard to do efficiently with OP_FILTER, but a MatchDecider could
take the coordinates from a value and calculate the distance from P,
saying "yes" only if that distance is less than X.
For a date range, I'd suggest considering the scheme Omega uses - there
we generate terms for the date (e.g. D20040820), the month (e.g. M200408),
and the year (e.g. Y2004). This allows a long date range to be
represented as a relatively small number of terms OR-ed together. If
you just indexed the D terms, a year span would require 365 terms.
> I haven't found the documentation very clear as to which way will work, or
> which way is preferred/faster. Please point me in the right direction.
Thanks for the feedback - I'll slot the above suggestions into the
documentation in a suitable place.
> BTW, I'm developing this with the PHP Xapian bindings, please let me know if
> a certain feature won't work under this setup.
I'm not certain if MatchDecider is supported by the PHP bindings. You
need to be able to sub-class it to use it, or supply the decision
function in some other way. Hopefully someone with more knowledge of
the bindings than me can give a more definitive answer...
Cheers,
Olly
More information about the Xapian-discuss
mailing list