[Xapian-discuss] Problem with weight_cutoff (not percent_cutoff)

Olly Betts olly at survex.com
Tue May 12 16:12:45 BST 2009


On Mon, May 11, 2009 at 06:44:34PM -0700, Kevin Duraj wrote:
> I have approximately 1 million of documents found by a search criteria
> in order to sort them I am need to cutoff documents that I know during
> indexing are not as important as some other documents. Therefore I
> assign 50+ weight to important documents during indexing and hoping
> that when my result sets gets too big I can cutoff all document with
> weight less than 50 as on the following example.
> 
> $enq->set_cutoff(0, 50);

The "weight" you are setting during indexing if the within document
frequency (wdf) of a term.  This is used to calculate the weight of
a matching document, but the document weight won't simply be equal
to the wdf, at least not with the supplied weighting schemes.

If you want the document weight to simply equal the sum of the wdf,
you could implement your own weighting scheme where this was true
(you'll probably need to use 1.1.x for this as user weighting schemes
are rather restricted in the statistics they can access in 1.0.x).

But beware that this will probably give you noticeably worse search
results.  A better way to get rid of the unimportant documents would
be to add a boolean term to the importnat ones and filter the results by
this term.

Cheers,
    Olly



More information about the Xapian-discuss mailing list