[Xapian-discuss] writing match deciders / custom handling of terms

Olly Betts olly at survex.com
Tue Nov 11 12:38:04 GMT 2008


2008/11/10 djcb <djcb.bulk at gmail.com>:
> Now, my question is about the MatchDeciders (I think). Suppose I have a
> query to find some messages in my Xapian DB, e.g:
>
>     subject:foo AND flags:A
>
> which would match message with subject 'foo' and messages with flag 'A'
> (having attachments). In the database, flags are just a number. So, I
> need some custom handling of this 'flags:A' term, and match the
> appropriate documents.
>
> Now, it seems(?) that MatchDeciders are the way to go -- but I don't see
> a way to do the custom handling of the flags parameter -- am I missing
> something simple?

The QueryParser doesn't (at least currently) allow you to generate
a MatchDecider - you need to add it separately.

In this case I'd probably just generate a term for each flag at index time
and use QueryParser::set_boolean_prefix().

> [2] But: there are some things that seem a bit strange though; e.g. there seems
>  to be no API to add the prefix to add_term, requiring me to manually
>  prefix the strings, which seems a bit hackish...

Well, TermGenerator can do prefixing for you.  But it's mostly just string
concatenation anyway.

>  and the Xapian::Sorter
>  which returns a string, which is then sorted; I was expecting something
>  similar to std::less, or GCompareFunc in GLib

The reason for generating the sort key rather than offering a comparator
is mostly down to the number of callbacks required - for a comparator
it's O(n.log(n)) while for generating a sort key it's O(n).

Since n can easily be millions, this can make quite a difference.

>  not being able to do
>  the comparison myself forces me to pad numeric values with 0 etc., so
>  the sorting works

See Xapian::sortable_serialise().  It's also much more compact than
storing numbers as ASCII strings and can handle floating point numbers.

Cheers,
    Olly



More information about the Xapian-discuss mailing list