[Xapian-discuss] Problem with stop words by indexing

emmanuel at engelhart.org emmanuel at engelhart.org
Thu May 27 14:20:36 BST 2010


 Le jeu 15/04/10 02:36, "Olly Betts" olly at survex.com a écrit:
> On Mon, Apr 05, 2010 at 07:13:02PM +0200, Emmanuel Engelhart wrote:
> > I try to remove stop words during the index process
> and I have no stemming.
> I have tried with a simple example but it does not
> work at all.
> 
> > I have my writableDatabase and my termGenerator
> (indexer) and they work
> well both together: I can index texts and search
> trough the database
> correctly.
> > 
> > But if I add (before indexing my texts):
> > Xapian::SimpleStopper stopper;
> > stopper.add("testword");
> > indexer.set_stopper(&stopper);
> > 
> > ... the result is exactly the same as before. I have
> checked with delve
> and "testword" is indexed.
> 
> http://article.gmane.org/gmane.comp.search.xapian.general/7571
> Looks like I failed to add that note to the API docs - now done.
> 
> This ought to be more configurable, as should some other things in
> TermGenerator.  I'm thinking we should look at how to improve TermGenerator
> in 1.3.x.

1.3.x release is a little bit far away for my use case (I speak here only about the capacity of removing unstemmed stop words).

I have (in termegenerator_internal.cc, line 129) changed the default value of stop_mode from STOPWORDS_INDEX_UNSTEMMED_ONLY to STOPWORDS_IGNORE and xapian does now exactly what I want.

Wouldn't be possible to simply add a property "stopper_strategy" to the termgenerator (or to the stopper) class and a method to modify it (like set_stopper_strategy() ?

Emmanuel




More information about the Xapian-discuss mailing list