[Xapian-discuss] QueryParser stemming

Olly Betts olly at survex.com
Thu Jun 9 16:42:41 BST 2005


On Thu, Jun 09, 2005 at 11:57:54AM +0100, Tim Brody wrote:
> I'm considering expanding Xapian to cover all of my search fields:
> authors => A
> title => T [stemmed]
> description => D [stemmed]
> date => Y [range?]
> (fulltext => F) [stemmed]
> 
> I would like to allow users to specify a query e.g.
> Brody impact analysis 2004
> 
> If the user isn't explicit with prefixing I need to be able modify the 
> query terms (e.g. 'Brody' is an author name) to apply stemming and 
> prefixing as appropriate.

I don't follow how you know 'Brody' is meant to be an author name.
Assuming all capitalised words are author names seems likely to
frustrate anyone who doesn't read and memorise the help.

> I don't think I can achieve this with the current Perl bindings. To do 
> title OR description OR fulltext I need to iterate over the terms and 
> add the appropriate prefix for each field. Similarly I will want to stem 
> title/description terms, but leave author terms alone.
> 
> So, is this feasible? Is there a better approach?

For searching over all fields, you can do the work at index time instead
of search time (with the exception of the non-stemming), which is likely
to give a faster search.  I'd probably recommend that approach.

So for the author, title, and description fields, you generate both the
prefixed terms, and non-prefixed ones.  Except you need to stem the
non-prefixed author terms then.  I don't see an easy way to avoid that.

As for not wanting the same stemming strategy for all fields,
QueryParser::add_prefix() should probably take a stem_strategy argument
which overrides the main setting.

> Shall I start adding the Internals to Perl's bindings?

The interfaces to the Internals classes are subject to arbitrary change
without notice.  It doesn't make sense to try to wrap them.

Anyway, the binding layer is the wrong place to add this in my view.  We
don't really want to add generic functionality there - that belongs in
the core library where it's accessible to all users.  Wrapping things in
a way more natural to the language is fine - for example lazy lists
instead of iterators.  That's inherently language specific.

> (And what happened to my patches? :-)

I'm working through them.  There are some changes which are good but I
want to generalise.  So far I've made == and != work the same as 'eq'
and 'ne' on all iterators, not just TermIterator - that's all applied
and committed.  Also, being able to use Perl lists which wrap iterators
should be available everywhere really.  I've stalled a little on that
because we're really going to want lazy evaluation for some cases (e.g.
Database::allterms) and I need to read up on how that's done.

Cheers,
    Olly



More information about the Xapian-discuss mailing list