[Xapian-discuss] writing match deciders / custom handling of terms

djcb djcb.bulk at gmail.com
Tue Nov 11 16:25:13 GMT 2008


Hi,

First of all, thanks for the quick reply.

On Tue, 11 Nov 2008, Olly Betts wrote:

> 2008/11/10 djcb <djcb.bulk at gmail.com>:
> > Now, my question is about the MatchDeciders (I think). Suppose I have a
> > query to find some messages in my Xapian DB, e.g:
> >
> >     subject:foo AND flags:A
> >

[....]

> > Now, it seems(?) that MatchDeciders are the way to go -- but I don't see
> > a way to do the custom handling of the flags parameter -- am I missing
> > something simple?
> 
> The QueryParser doesn't (at least currently) allow you to generate
> a MatchDecider - you need to add it separately.
> 
> In this case I'd probably just generate a term for each flag at index time
> and use QueryParser::set_boolean_prefix().

Hmmm... that would indeed work if I have only a handful of flags; it
does not seem to work though with the more general case; another search
criterion would match all mails more recent than three weeks, for which
I'd use something like:

	  date:3w..

and for message size, maybe:
    
	 size:3k..3M

to match messages between 3Kb and 3Mb. I guess I need to do some custom
handling there... is this what is discussed in ticket #220?
	http://trac.xapian.org/ticket/220

Now, the 'AuthorValueRangeProcessor' looks easy enough; would something
similar work for my date: / size: above? 

 > > [2] But: there are some things that seem a bit strange though; e.g. there seems
> >  to be no API to add the prefix to add_term, requiring me to manually
> >  prefix the strings, which seems a bit hackish...
> 
> Well, TermGenerator can do prefixing for you.  But it's mostly just string
> concatenation anyway.

Yes -- but that was my point, when I use add_term (I don't want to use
the TermGenerator for known atomic strings), I have to do it by hand,
which requires me to use some internal representation (the prefix) that
other functions understand. I think it would be nicer to hide that
implementation detail from the programmer.
 
> >  and the Xapian::Sorter
> >  which returns a string, which is then sorted; I was expecting something
> >  similar to std::less, or GCompareFunc in GLib
> 
> The reason for generating the sort key rather than offering a comparator
> is mostly down to the number of callbacks required - for a comparator
> it's O(n.log(n)) while for generating a sort key it's O(n).
>
> Since n can easily be millions, this can make quite a difference.

True; but also a bit misleading; the complexity of the whole sorting
operation is O(n.log(n)) in both cases; creating a million sortable
string representations of some value might be quite expensive. And the
overall performance will be dominated by the actual sorting, which can
be faster with comparators versus std::string ==.

Anyway, I don't have the numbers so I'll trust your judgement...  My
comment was mainly about the unexpected API. Overall Xapian seems quite
fast.
 
> >  not being able to do
> >  the comparison myself forces me to pad numeric values with 0 etc., so
> >  the sorting works
> 
> See Xapian::sortable_serialise().  It's also much more compact than
> storing numbers as ASCII strings and can handle floating point numbers.

Ok, I'll try that... 

Thanks for your help!
Dirk.

-- 
-----------------------------------------------
Dirk-Jan C. Binnema <djcb at djcbsoftware.nl>
blog: http://www.djcbsoftware.nl/ChangeLog (NL)
      http://djcbflux.blogspot.com (EN)
chat: djcb at jabber.org
-----------------------------------------------



More information about the Xapian-discuss mailing list