[Xapian-discuss] Xapian vs Lucene

Yannick Warnier ywarnier at beeznest.org
Sat Jan 27 09:41:07 GMT 2007


Hello,

It's probably quite troll-risky to put a title like this, but did anyone
take the trouble to compare Lucene to Xapian and make a list of
differences?

As I told the list at the end of last year, I'm going to have to
integrate an indexing/search engine in the coming weeks or months. It
will be integrated to Dokeos, an open-source e-learning application in
PHP, and at the moment we are using MnoGoSearch which is alright but the
problem lies in the indexing engine that we cannot really provide with
our application as only the Linux version is GPL and it runs as a C
program that has to be run via cron. Also, the free/collaborative
support and mailing-list activity are a bit too loose/slow.

So far, my understanding is that I can use Xapian PHP bindings to index
"on the fly" when inserting new content in my e-learning application. It
is also my understanding that Lucene is a piece of code in Java (which
is wrong for me as long as it involves more languages than just PHP for
the Dokeos administrators to deal with) that is quite popular and that
does things alright.

One problem I know of (from a Perl programmer) about Lucene is that the
Perl bindings do not actually handle unicode characters, and so the
*universality* of Lucene is lost when using it via the Perl bindings.

Of course, Dokeos-wise, it is important to have UTF-8 handling as we
plan to move to full-UTF-8 just before we start integrating the new
indexing...*stuff*.

As far as I am aware of, my search application (as a finished/integrated
product) should deal with:
- indexing of webpages
- indexing of documents (all office documents)
- indexing/parsing of XML metadata
- awareness of user permissions (a result should only display if the
searching user is authorized to see it)

So, my question is: which is the best for my case? Lucene or Xapian? Any
benchmarks or comparisons available?

Of course, this is specialised advice and I should really post the same
mail to the Lucene list, but I'm not subscribed there yet, so for now I
will analyse the feedback I get from here only (which will obviously
distort it just a little bit).

Thanks a lot,

Yannick




More information about the Xapian-discuss mailing list