[Xapian-devel] Backend for Lucene format indexes-How to get doclength

jiangwen jiang jiangwen127 at gmail.com
Mon Jun 17 14:28:23 BST 2013


*Or do you mean that it's one number per document whereas the other stats
are per database, so it's harder to store it?*

yes, I mean this. It's a huge data. If a new doclength list(contains all
the doclength in a list, like chert)
is added by myself, I am concern about:
1. This doclength list may be the bottlenect in this backend,
http://trac.xapian.org/ticket/326
2. Change too much above Lucene file format, then it's hard to compare
performance between Xapian and Lucene

Some ideas:
1. Using rank algorithm without doclength, such as BM25Weight or TradWeight
without doclength, or tfidfWeight.
    If ranking results will be not good without doclength?
2. Stores doclength in .prx payload when doing Lucene indexing.

https://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/index/Payload.html
    http://searchhub.org/2009/08/05/getting-started-with-payloads/
    But this method has obvious drawback, it's not for general Lucene index
data, if doclength is not stored, this method
    doesn't works

>
> Any suggestions?

Regards
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20130617/c83c7595/attachment.htm>


More information about the Xapian-devel mailing list