[Xapian-discuss] Thesaurus feature?

Olly Betts olly at survex.com
Sun May 13 04:35:54 BST 2007


On Sat, May 12, 2007 at 09:46:54PM -0500, Yannick Warnier wrote:
> I was just having a quick look at Xapian's documentation again and
> wondering... Does Xapian offer some kind of thesaurus functionality?

No, there's no thesaurus feature at present.

> If not, would it be trivial to implement one considering the current
> API, or is that something that might take very long?

Implementing the code to handle a thesaurus probably isn't a major
project - it depends exactly what you're expecting it to do though.
For example, it should be hard to add an "and synonyms" query operator
so `~facts' might be roughly equivalent to `(facts OR information OR
data OR statistics)'.

The hard part of implementing a thesaurus is often generating the
thesaurus data and especially keeping it up to date.  If your
application is something like a news website, the vocabulary changes on
an almost daily basis, but even in other fields it evolves over time.

> And this should be the topic of another e-mail, but did anybody discuss
> about implementing word stemming for East-Asian languages?

There's been some past discussion on this list and the snowball list,
at least for Japanese:

http://search.gmane.org/?query=japanese%20stemming

Cheers,
    Olly



More information about the Xapian-discuss mailing list