[Xapian-discuss] Proper noun stemming

Olly Betts olly at survex.com
Mon Mar 31 04:22:56 BST 2008


On Thu, Mar 27, 2008 at 12:05:05PM +0000, Colin Bell wrote:
> I was wondering if anyone had a solution for the following problem.
> 
> I user QueryParser to stem my documents before adding them to a  
> database.

As James points out, TermGenerator is more appropriate for indexing.

> During the stemming process I would like to find a way of  
> keeping proper nouns that span two or more words together as a phrase.  
> For example "New York" or "Gordon Brown" or "Prime Minister" get spilt  
> up. I see the STEM_SOME allows some operators, but I can't see how  
> these might help in this situation.

When parsing queries, you could achieve special handling of noun phrases
using multi-word synonyms:

http://xapian.org/docs/synonyms.html

Currently, you need to augment the terms indexed by TermGenerator for
this to be useful in this way.  E.g. You could synonym "New York" to the
term "new york" (with a space in) and also check documents for such
phrases and add them as terms at index time.

Cheers,
    Olly



More information about the Xapian-discuss mailing list