[Xapian-discuss] Rqt for Features
Tim Brody
tdb01r at ecs.soton.ac.uk
Mon Jul 12 18:14:57 BST 2004
----- Original Message -----
From: "Richard Boulton" <richard at tartarus.org>
> > Of course if I could wave a magic wand I would modify QueryParser's API
> > anyway .... :-)
>
> Certainly, it is weird to have "set_stemming_options()" take a stopper:
> I'd like to see that fixed. It also has a load of public members which
> really should be private...
>
> Additionally, I'd like to see some code for indexing a chunk of text in
> a manner compatible with the query parser put into a library.
> Currently, the easiest approach for application writers is to cut and
> paste blocks of code from omindex...
>
> Patches for any of these things would be most welcome - but discussion
> and other suggestions are also appreciated.
Here's a completely untested (but probably compiles) patch for the header:
http://santos.ecs.soton.ac.uk/queryparser.h.patch
If I can get bison up to date I will test it, but I suspect there's more
complex revision to do than I know how to.
I *guess* that Stem should be called Stemmer to be consistent (e.g. Indexer,
MSetIterator etc.)
This causes a segfault (due to the destroy stemmer in QP's destructor):
QueryParser qp();
Stem stem("english");
qp.stemmer = &stem;
Is it preferable to pass a language string or object to QP, my naive opinion
is objects should be passed?
Should stop terms be applied before or after stemming (it's currently
before?)?
Is there a central configuration for languages, i.e. somewhere closer to
Stem that stopwords could be placed so that those adding language support
don't need to change multiple header files?
I would be happy to do some documenting too (having had to read the C++ code
to understand what prefixes was for ...), but I suspect QP could use some
root-canal work :-)
All the best,
Tim.
More information about the Xapian-discuss
mailing list