[Xapian-discuss] Proper noun stemming
James Aylett
james-xapian at tartarus.org
Thu Mar 27 13:08:13 GMT 2008
On Thu, Mar 27, 2008 at 12:47:33PM +0000, Colin Bell wrote:
> >As one of the above documents says, the convention is to store
> >unstemmed forms with positional information, so the proximity of
> >'Gordon' to 'Brown' is retained in the database, and PHRASE and NEAR
> >searches will be able to take advantage of that. (So the search
> >'meeting "Gordon Brown"' should match the above well.)
>
> This sounds ideal. Storing "Gordon" "Brown" and "Gordon Brown" and
> linking them is a great solution. The only trick is picking out proper
> nouns like "Gordon Brown" or "Prime Minister" during the stemming
> process to store them as phrases. Will TermGenerator be able to do
> this? I'm going through the docs on this right now.
No, it doesn't do that at all. It will store "Gordon" and "Brown" with
appropriate positional information so that phrase searches work. In
most cases there isn't a good reason to store "Gordon Brown" at all.
Have a think about what *queries* you want to support, and then figure
out if the TermGenerator/QueryParser pairing will achieve that.
J
--
/--------------------------------------------------------------------------\
James Aylett xapian.org
james at tartarus.org uncertaintydivision.org
More information about the Xapian-discuss
mailing list