[Xapian-devel] Backend for Lucene format indexes-How to get doclength
jiangwen jiang
jiangwen127 at gmail.com
Sun Jun 16 05:32:31 BST 2013
Hi, all:
I have wrote a demo patch for Backend for Lucene format indexes, Lucene
version is 3.6.2.
http://lucene.apache.org/core/3_6_2/fileformats.html
Now, this demo patch just support the basic features in Lucene. Compound
File(.cfs/.cfe)、term vector(.tvx/.tvd/.tvf)
delete document(.del) are not supported, skip list in .fdx is not supported
too
example/quest.cc is used to test this demo. query like this:
field_name:term, or file_name:term1 AND field_name:term2
Until now, I found some data needed for BM25 in Xapian are not existed in
Lucene:
1. doclength_lower_bound、doclength_upper_bound
2. wdf_lower_bound、wdf_uppper_bound
3. total_length
4. doclength(for each document)
1-3 are statistics data, can be caculated when doing copydatabase, and
store them in somewhere. But doclengh is
hard to do this way.
1. some other data instead of doclength?
2. Xapian support other rank algorithm which does not need doclength?
Is there some suggestions to solve this problem?
And the demo patch is here:
https://github.com/white127/xapian-patch/blob/master/xapian_lucene_demo.patch
Regards
Jiang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20130616/42f7567a/attachment.htm>
More information about the Xapian-devel
mailing list