[Xapian-devel] Lucene 3.6.2 backend for xapian (#25)

Olly Betts olly at survex.com
Thu Oct 31 02:49:44 GMT 2013


On Thu, Oct 31, 2013 at 10:24:26AM +0800, jiangwen jiang wrote:
> Yes, there's two choices at the beginning:
> 1. Using multi-database.
> 2. Treat lucene database as a single database.
> Finally, I choose 2. It's a long time ago, I am not quiet sure why this
> decision is made, maybe:
> 1. We can handle multiple lucene databases.

That should be possible with multi-databases - you'd just end up with a
subdatabase for each segment in all of the lucene databases.  If we make
Xapian::Database's constructor create a Database object with one sub
database per segment in the Lucene database, this sort of thing should
just work:

    Xapian::Database db;
    db.add_database(Xapian::Database("/path/to/lucene1"));
    db.add_database(Xapian::Database("/path/to/lucene2"));
    db.add_database(Xapian::Database("/path/to/xapian1"));
    db.add_database(Xapian::Database("/path/to/xapian2"));

> 2. I am not sure if multi-database can meet the requirements, such as:
>    Getting a doc_freq(how many documents contains the term) of a particular
> term, actuallly, I want
>    get sum of doc_freq of a particular term in all lucene segments, I am
> not sure xapian multi-database do it this way.

A multi-database does sum the "doc_freq" over all subdatabases.

In general, multi-databases act just like a single database with the
same contents.  There's one exception - when generating an ESet, you
can ask it to approximate statistics by extrapolating from the first
sub database rather than summing over all of them, but you can also
tell it to calculate the exact statistics instead.  This just offers
a trade-off between speed and exactness.

> Do you think multi-database is a better way to handle lucene database?

I think so - it seems a natural fit.  Sorry for not thinking of this
earlier.

Cheers,
    Olly



More information about the Xapian-devel mailing list