[Xapian-devel] Backend for Lucene format indexes-How to get doclength

jiangwen jiang jiangwen127 at gmail.com
Sun Jun 16 05:41:29 BST 2013


Additional, I set fixed default values to datas which not existed in
Lucene, to make this demo runable,
the demo is not fully tested

2013/6/16 jiangwen jiang <jiangwen127 at gmail.com>

> Hi, all:
>
> I have wrote a demo patch for Backend for Lucene format indexes, Lucene
> version is 3.6.2.
> http://lucene.apache.org/core/3_6_2/fileformats.html
>
> Now, this demo patch just support the basic features in Lucene. Compound
> File(.cfs/.cfe)、term vector(.tvx/.tvd/.tvf)
> delete document(.del) are not supported, skip list in .fdx is not
> supported too
>
> example/quest.cc is used to test this demo. query like this:
> field_name:term, or file_name:term1  AND field_name:term2
>
> Until now, I found some data needed for BM25 in Xapian are not existed in
> Lucene:
> 1. doclength_lower_bound、doclength_upper_bound
> 2. wdf_lower_bound、wdf_uppper_bound
> 3. total_length
> 4. doclength(for each document)
> 1-3 are statistics data, can be caculated when doing copydatabase, and
> store them in somewhere. But doclengh is
> hard to do this way.
>
> 1. some other data instead of doclength?
> 2. Xapian support other rank algorithm which does not need doclength?
> Is there some suggestions to solve this problem?
>
> And the demo patch is here:
>
> https://github.com/white127/xapian-patch/blob/master/xapian_lucene_demo.patch
>
> Regards
> Jiang
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20130616/6a398a28/attachment.htm>


More information about the Xapian-devel mailing list