[Xapian-discuss] Searching in different fields

Olly Betts olly at survex.com
Wed Dec 27 15:15:55 GMT 2006


On Mon, Dec 25, 2006 at 08:53:29AM -0500, Jim wrote:
> M.J. van der Veen wrote:
> > Now, I want to search for users whose name is something like
> > 'maarten', but can be 'maarten123' or 'smaarten3' aswell.
>
> I'm afraid you are partially out of luck.  The Xapian query parser has a
> wildcard (http://xapian.org/docs/queryparser.html) but it's only for
> trailing values.  You could search for maarten* and get maarten123, but
> there doesn't appear to be a way to get smaarten. 

There's no support for building a Query matching an arbitrary substring
in the QueryParser, but it's certainly possible.  As with any matching
problem, you need to decide how much work to do at indexing time and how
much at search time.  So in this case, the obvious options (at the
extremes of the scale) are:

(a) for every term you want to support a substring match for, generate
a prefixed term for every possible substring (so all terms containing
the substring "maarten" would generate XSUBmaarten and you can just
search for this).  This probably isn't going to be a good approach
in general because of the combinatorial explosion of terms required.

(b) at search time turn "*maarten*" (or whatever syntax you want) into
an OR query for all the terms which match the wildcarded pattern by
scanning the list of all terms.  This is how "maarten*" is currently
handled by the QueryParser.  However, for right truncation we only need
to scan a subset of the list of all terms.  If you want to allow
wildcards on the left and right, you might want to cache the term list
in a special purpose data structure for speed (if you only want to
support wildcards on the left, a cheap trick is to reverse all terms
before indexing and do use right truncation on the reversed terms!)

Cheers,
    Olly



More information about the Xapian-discuss mailing list