[Xapian-discuss] writing match deciders / custom handling of terms
djcb
djcb.bulk at gmail.com
Tue Nov 11 16:25:13 GMT 2008
Hi,
First of all, thanks for the quick reply.
On Tue, 11 Nov 2008, Olly Betts wrote:
> 2008/11/10 djcb <djcb.bulk at gmail.com>:
> > Now, my question is about the MatchDeciders (I think). Suppose I have a
> > query to find some messages in my Xapian DB, e.g:
> >
> > subject:foo AND flags:A
> >
[....]
> > Now, it seems(?) that MatchDeciders are the way to go -- but I don't see
> > a way to do the custom handling of the flags parameter -- am I missing
> > something simple?
>
> The QueryParser doesn't (at least currently) allow you to generate
> a MatchDecider - you need to add it separately.
>
> In this case I'd probably just generate a term for each flag at index time
> and use QueryParser::set_boolean_prefix().
Hmmm... that would indeed work if I have only a handful of flags; it
does not seem to work though with the more general case; another search
criterion would match all mails more recent than three weeks, for which
I'd use something like:
date:3w..
and for message size, maybe:
size:3k..3M
to match messages between 3Kb and 3Mb. I guess I need to do some custom
handling there... is this what is discussed in ticket #220?
http://trac.xapian.org/ticket/220
Now, the 'AuthorValueRangeProcessor' looks easy enough; would something
similar work for my date: / size: above?
> > [2] But: there are some things that seem a bit strange though; e.g. there seems
> > to be no API to add the prefix to add_term, requiring me to manually
> > prefix the strings, which seems a bit hackish...
>
> Well, TermGenerator can do prefixing for you. But it's mostly just string
> concatenation anyway.
Yes -- but that was my point, when I use add_term (I don't want to use
the TermGenerator for known atomic strings), I have to do it by hand,
which requires me to use some internal representation (the prefix) that
other functions understand. I think it would be nicer to hide that
implementation detail from the programmer.
> > and the Xapian::Sorter
> > which returns a string, which is then sorted; I was expecting something
> > similar to std::less, or GCompareFunc in GLib
>
> The reason for generating the sort key rather than offering a comparator
> is mostly down to the number of callbacks required - for a comparator
> it's O(n.log(n)) while for generating a sort key it's O(n).
>
> Since n can easily be millions, this can make quite a difference.
True; but also a bit misleading; the complexity of the whole sorting
operation is O(n.log(n)) in both cases; creating a million sortable
string representations of some value might be quite expensive. And the
overall performance will be dominated by the actual sorting, which can
be faster with comparators versus std::string ==.
Anyway, I don't have the numbers so I'll trust your judgement... My
comment was mainly about the unexpected API. Overall Xapian seems quite
fast.
> > not being able to do
> > the comparison myself forces me to pad numeric values with 0 etc., so
> > the sorting works
>
> See Xapian::sortable_serialise(). It's also much more compact than
> storing numbers as ASCII strings and can handle floating point numbers.
Ok, I'll try that...
Thanks for your help!
Dirk.
--
-----------------------------------------------
Dirk-Jan C. Binnema <djcb at djcbsoftware.nl>
blog: http://www.djcbsoftware.nl/ChangeLog (NL)
http://djcbflux.blogspot.com (EN)
chat: djcb at jabber.org
-----------------------------------------------
More information about the Xapian-discuss
mailing list