[Xapian-devel] Backend for Lucene format indexes-How to get doclength
jiangwen jiang
jiangwen127 at gmail.com
Tue Sep 3 07:38:32 BST 2013
Collection frequency means how many times a particular term appears in all
docs, this data is not exists in Lucene backends(I will check it in lucene
mailing list later).
Termfreq(how many docs contains a particular term) is the most similar data
to collection freq, but I don't think collection freq can be
instead of termfreq.
Now I am trying to caculate this data in copydatabase.
Thanks
Regards
2013/9/2 Olly Betts <olly at survex.com>
> On Mon, Sep 02, 2013 at 09:21:48AM +0800, jiangwen jiang wrote:
> > TfIdfWeight and BM25(b=0) also need wdf_upper_bound, it is not exists in
> > Lucene backends.
>
> If you don't provide an implementation of wdf_upper_bound(), the default
> is to use the collection frequency of the term, so provided that
> information is available in the lucene files, the lack of
> wdf_upper_bound information isn't a show stopper.
>
> > I think this data will be caculated when doing copydatabase, I will
> update
> > the code later
>
> That's probably a good plan though.
>
> Cheers,
> Olly
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20130903/ec83f1d9/attachment.html>
More information about the Xapian-devel
mailing list