[Xapian-discuss] Proper noun stemming

Olly Betts olly at survex.com
Mon Mar 31 04:22:56 BST 2008

On Thu, Mar 27, 2008 at 12:05:05PM +0000, Colin Bell wrote:
> I was wondering if anyone had a solution for the following problem.
> I user QueryParser to stem my documents before adding them to a  
> database.

As James points out, TermGenerator is more appropriate for indexing.

> During the stemming process I would like to find a way of  
> keeping proper nouns that span two or more words together as a phrase.  
> For example "New York" or "Gordon Brown" or "Prime Minister" get spilt  
> up. I see the STEM_SOME allows some operators, but I can't see how  
> these might help in this situation.

When parsing queries, you could achieve special handling of noun phrases
using multi-word synonyms:


Currently, you need to augment the terms indexed by TermGenerator for
this to be useful in this way.  E.g. You could synonym "New York" to the
term "new york" (with a space in) and also check documents for such
phrases and add them as terms at index time.


More information about the Xapian-discuss mailing list