[Xapian-devel] Backend for Lucene format indexes-How to get doclength
richard at tartarus.org
Mon Jun 17 16:06:36 BST 2013
You might want to look at how Lucene has implemented document length lookup
for the BM25Similarity class (added in Lucene 4.0):
I assumed they're using a document payload for storing the lengths, but
haven't looked into it.
On 17 June 2013 14:28, jiangwen jiang <jiangwen127 at gmail.com> wrote:
> *Or do you mean that it's one number per document whereas the other stats
> are per database, so it's harder to store it?*
> yes, I mean this. It's a huge data. If a new doclength list(contains all
> the doclength in a list, like chert)
> is added by myself, I am concern about:
> 1. This doclength list may be the bottlenect in this backend,
> 2. Change too much above Lucene file format, then it's hard to compare
> performance between Xapian and Lucene
> Some ideas:
> 1. Using rank algorithm without doclength, such as BM25Weight or
> TradWeight without doclength, or tfidfWeight.
> If ranking results will be not good without doclength?
> 2. Stores doclength in .prx payload when doing Lucene indexing.
> But this method has obvious drawback, it's not for general Lucene
> index data, if doclength is not stored, this method
> doesn't works
>> Any suggestions?
> Xapian-devel mailing list
> Xapian-devel at lists.xapian.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Xapian-devel