[Xapian-discuss] Proper noun stemming
olly at survex.com
Mon Mar 31 04:22:56 BST 2008
On Thu, Mar 27, 2008 at 12:05:05PM +0000, Colin Bell wrote:
> I was wondering if anyone had a solution for the following problem.
> I user QueryParser to stem my documents before adding them to a
As James points out, TermGenerator is more appropriate for indexing.
> During the stemming process I would like to find a way of
> keeping proper nouns that span two or more words together as a phrase.
> For example "New York" or "Gordon Brown" or "Prime Minister" get spilt
> up. I see the STEM_SOME allows some operators, but I can't see how
> these might help in this situation.
When parsing queries, you could achieve special handling of noun phrases
using multi-word synonyms:
Currently, you need to augment the terms indexed by TermGenerator for
this to be useful in this way. E.g. You could synonym "New York" to the
term "new york" (with a space in) and also check documents for such
phrases and add them as terms at index time.
More information about the Xapian-discuss