[Xapian-discuss] Stemming and Quoted Phrases

Mike Boone boonedocks at gmail.com
Thu Oct 18 01:11:01 BST 2007


On 10/17/07, Olly Betts <olly at survex.com> wrote:
> No - to keep the database size down, TermGenerator only stores
> positional information for unstemmed terms so as things stand, if you
> stemmed the terms, you couldn't keep the positional requirement.

In my effort to more or less port my code that I used with 0.8.5 to
1.0.3, I am not using the TermGenerator. Perhaps I should be. But for
now I've been splitting words myself, stemming them, prefixing a Z
(for 1.0.3), and also including the term position, all via
add_posting. On the query end though, we've been using the QueryParser
with no problems except this one.

> But I always felt it was wrong that quoted phrases were subject to
> stemming before.  Do you have some examples where it makes more sense to
> stem them?

It might be nice to search an exact phrase, but searching a stemmed
phrase is good enough for my purposes, and eliminates the need to
index the full term. In my application, a potential user would
probably be happy if a search for "chemical engineering" also bought
up results like "chemical engineers".

It would be nice if there were some flag directive where I could tell
the query parser to both stem and set the word positions. Maybe that
already exists, but I'm not sure which flag(s) to use.



More information about the Xapian-discuss mailing list