[Xapian-discuss] writing match deciders / custom handling of terms
Olly Betts
olly at survex.com
Tue Nov 11 12:38:04 GMT 2008
2008/11/10 djcb <djcb.bulk at gmail.com>:
> Now, my question is about the MatchDeciders (I think). Suppose I have a
> query to find some messages in my Xapian DB, e.g:
>
> subject:foo AND flags:A
>
> which would match message with subject 'foo' and messages with flag 'A'
> (having attachments). In the database, flags are just a number. So, I
> need some custom handling of this 'flags:A' term, and match the
> appropriate documents.
>
> Now, it seems(?) that MatchDeciders are the way to go -- but I don't see
> a way to do the custom handling of the flags parameter -- am I missing
> something simple?
The QueryParser doesn't (at least currently) allow you to generate
a MatchDecider - you need to add it separately.
In this case I'd probably just generate a term for each flag at index time
and use QueryParser::set_boolean_prefix().
> [2] But: there are some things that seem a bit strange though; e.g. there seems
> to be no API to add the prefix to add_term, requiring me to manually
> prefix the strings, which seems a bit hackish...
Well, TermGenerator can do prefixing for you. But it's mostly just string
concatenation anyway.
> and the Xapian::Sorter
> which returns a string, which is then sorted; I was expecting something
> similar to std::less, or GCompareFunc in GLib
The reason for generating the sort key rather than offering a comparator
is mostly down to the number of callbacks required - for a comparator
it's O(n.log(n)) while for generating a sort key it's O(n).
Since n can easily be millions, this can make quite a difference.
> not being able to do
> the comparison myself forces me to pad numeric values with 0 etc., so
> the sorting works
See Xapian::sortable_serialise(). It's also much more compact than
storing numbers as ASCII strings and can handle floating point numbers.
Cheers,
Olly
More information about the Xapian-discuss
mailing list