[Xapian-devel] Backend for Lucene format indexes-How to get doclength

jiangwen jiang jiangwen127 at gmail.com
Sun Jun 16 05:32:31 BST 2013


Hi, all:

I have wrote a demo patch for Backend for Lucene format indexes, Lucene
version is 3.6.2.
http://lucene.apache.org/core/3_6_2/fileformats.html

Now, this demo patch just support the basic features in Lucene. Compound
File(.cfs/.cfe)、term vector(.tvx/.tvd/.tvf)
delete document(.del) are not supported, skip list in .fdx is not supported
too

example/quest.cc is used to test this demo. query like this:
field_name:term, or file_name:term1  AND field_name:term2

Until now, I found some data needed for BM25 in Xapian are not existed in
Lucene:
1. doclength_lower_bound、doclength_upper_bound
2. wdf_lower_bound、wdf_uppper_bound
3. total_length
4. doclength(for each document)
1-3 are statistics data, can be caculated when doing copydatabase, and
store them in somewhere. But doclengh is
hard to do this way.

1. some other data instead of doclength?
2. Xapian support other rank algorithm which does not need doclength?
Is there some suggestions to solve this problem?

And the demo patch is here:
https://github.com/white127/xapian-patch/blob/master/xapian_lucene_demo.patch

Regards
Jiang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20130616/42f7567a/attachment.htm>


More information about the Xapian-devel mailing list