[Xapian-discuss] Proper noun stemming

Colin Bell colinabell at gmail.com
Thu Mar 27 15:07:02 GMT 2008


As per my previous mail. I created a test SimpleStopper

Xapian::SimpleStopper myStopper;
myStopper.add("a");
myStopper.add("and");
myStopper.add("the");
myStopper.add("in");

and used it with

Xapian::Utf8Iterator myIterator;
myIterator.assign(text);
Xapian::TermGenerator indexer;
indexer.set_stemmer(Xapian::Stem("english"));
indexer.set_stopper(&myStopper);
indexer.set_database(database);
indexer.set_document(doc);
indexer.set_flags(Xapian::TermGenerator::FLAG_SPELLING);
indexer.index_text(myIterator);

and it does not remove any of the stopwords at all. I tried without  
the stopper set and the results were identical, so I'm pretty sure  
either of my stoppers are being ignored by TermGenerator.

Does anyone know why, what am I doing wrong?

Many thanks

Colin

On 27 Mar 2008, at 13:08, James Aylett wrote:

> On Thu, Mar 27, 2008 at 12:47:33PM +0000, Colin Bell wrote:
>
>>> As one of the above documents says, the convention is to store
>>> unstemmed forms with positional information, so the proximity of
>>> 'Gordon' to 'Brown' is retained in the database, and PHRASE and NEAR
>>> searches will be able to take advantage of that. (So the search
>>> 'meeting "Gordon Brown"' should match the above well.)
>>
>> This sounds ideal. Storing "Gordon" "Brown" and "Gordon Brown" and
>> linking them is a great solution. The only trick is picking out  
>> proper
>> nouns like "Gordon Brown" or "Prime Minister" during the stemming
>> process to store them as phrases. Will TermGenerator be able to do
>> this? I'm going through the docs on this right now.
>
> No, it doesn't do that at all. It will store "Gordon" and "Brown" with
> appropriate positional information so that phrase searches work. In
> most cases there isn't a good reason to store "Gordon Brown" at all.
>
> Have a think about what *queries* you want to support, and then figure
> out if the TermGenerator/QueryParser pairing will achieve that.
>
> J
>
> -- 
> /--------------------------------------------------------------------------\
>  James Aylett                                                   
> xapian.org
>  james at tartarus.org                                
> uncertaintydivision.org
>
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss



More information about the Xapian-discuss mailing list