[Xapian-discuss] Rqt for Features
Richard Boulton
richard at tartarus.org
Fri Jul 9 17:06:00 BST 2004
Tim Brody wrote:
> Having added wrappers for QueryParser I wonder whether it would be
> worthwhile revising Stopper. I can't think of a situation where a stopper
> would need to be more intelligent than containing a list of words to stop,
> so seems a little pointless distributing a class in Xapian that doesn't do
> this.
I think the actual process of stopping is always going to be this
simple, but the selection of words to stop isn't necessarily so simple.
In particular, it would be useful to have prebuilt lists of common
stopword for (at least) each of the languages which we provide stemmers
for. The user might then create, for example, a StandardStopper object,
passing the name of a language, rather than having to keep a list of
words in their application.
However, there's a strong argument for providing a class such as yours
as part of Xapian, since it would be useful to many users. Could you
add this to the bugzilla too, so it won't get forgotten?
> Of course if I could wave a magic wand I would modify QueryParser's API
> anyway .... :-)
QueryParser is a great deal less polished than other parts of Xapian's
interface - which is partly why it is separated out into a separate
library. It was originally written for a specific application (omega),
and then extracted into a separate library, but it is due for a good
look. In other words - its API is open for discussion.
Certainly, it is weird to have "set_stemming_options()" take a stopper:
I'd like to see that fixed. It also has a load of public members which
really should be private...
Additionally, I'd like to see some code for indexing a chunk of text in
a manner compatible with the query parser put into a library.
Currently, the easiest approach for application writers is to cut and
paste blocks of code from omindex...
Patches for any of these things would be most welcome - but discussion
and other suggestions are also appreciated.
--
Richard
More information about the Xapian-discuss
mailing list