[Xapian-devel] Lucene 3.6.2 backend for xapian (#25)

Olly Betts olly at survex.com
Wed Oct 30 22:50:38 GMT 2013


[Replying to xapian-devel, as I think a wider audience would be useful]

On Mon, Oct 21, 2013 at 11:24:51PM +0800, jiangwen jiang wrote:
> yes, it's less efficient. Lucene database has multiple segments, each
> segment can treat as a independent database. The same term may exists in >=
> 1 segments.

Sorry for taking a while to respond - I've been both busy and mulling
this over.

I think that perhaps the best way to map this into Xapian is for each
Lucene "segment" to be handled as a database in Xapian, and use the
multi-database support to search them together.

That's likely to need some adjustments to the multi-database support,
but I think otherwise we'll end up duplicating a lot of that machinery
in the Lucene backend anyway.

I've not looked at the Lucene file structure with this in mind yet
though - do you see any obvious problems with this approach?

> Xapian::TermIterator it = db_in.allterms_begin();
> This method traverse all terms in the first segment, then the second
> segment, until the last segment.

Iteration over all terms should return the terms in sorted order (by
byte value) and without duplicates, neither of which is achieved by
handling each segment in turn like this.  But we already handle merging
allterms lists for multiple databases.

Cheers,
    Olly



More information about the Xapian-devel mailing list