[Xapian-discuss] Proper noun stemming

James Aylett james-xapian at tartarus.org
Thu Mar 27 13:08:13 GMT 2008


On Thu, Mar 27, 2008 at 12:47:33PM +0000, Colin Bell wrote:

> >As one of the above documents says, the convention is to store
> >unstemmed forms with positional information, so the proximity of
> >'Gordon' to 'Brown' is retained in the database, and PHRASE and NEAR
> >searches will be able to take advantage of that. (So the search
> >'meeting "Gordon Brown"' should match the above well.)
> 
> This sounds ideal. Storing "Gordon" "Brown" and "Gordon Brown" and  
> linking them is a great solution. The only trick is picking out proper  
> nouns like "Gordon Brown" or "Prime Minister" during the stemming  
> process to store them as phrases. Will TermGenerator be able to do  
> this? I'm going through the docs on this right now.

No, it doesn't do that at all. It will store "Gordon" and "Brown" with
appropriate positional information so that phrase searches work. In
most cases there isn't a good reason to store "Gordon Brown" at all.

Have a think about what *queries* you want to support, and then figure
out if the TermGenerator/QueryParser pairing will achieve that.

J

-- 
/--------------------------------------------------------------------------\
  James Aylett                                                  xapian.org
  james at tartarus.org                               uncertaintydivision.org



More information about the Xapian-discuss mailing list