[Xapian-discuss] Problem with weight_cutoff (not percent_cutoff)
Kevin Duraj
kevin.softdev at gmail.com
Wed May 13 00:16:55 BST 2009
Yes that is also what I do, adding boolean term to important documents
so when the result is very large, then I add boolean term to the
search criteria so less documents has to be retrieve and sorted. Now,
I see that we have also get_weight() function along with get_percent()
that is in examples that come with Xapian. I implemented get_weight
to see how the search criteria correlates with all variables,
get_percent and get_weight of matched documents.
To see example: Relevance: 100% , Weight: 11
http://myhealthcare.com/search?q=allergy
Thanks,
Kevin Duraj
On Tue, May 12, 2009 at 8:12 AM, Olly Betts <olly at survex.com> wrote:
> On Mon, May 11, 2009 at 06:44:34PM -0700, Kevin Duraj wrote:
>> I have approximately 1 million of documents found by a search criteria
>> in order to sort them I am need to cutoff documents that I know during
>> indexing are not as important as some other documents. Therefore I
>> assign 50+ weight to important documents during indexing and hoping
>> that when my result sets gets too big I can cutoff all document with
>> weight less than 50 as on the following example.
>>
>> $enq->set_cutoff(0, 50);
>
> The "weight" you are setting during indexing if the within document
> frequency (wdf) of a term. This is used to calculate the weight of
> a matching document, but the document weight won't simply be equal
> to the wdf, at least not with the supplied weighting schemes.
>
> If you want the document weight to simply equal the sum of the wdf,
> you could implement your own weighting scheme where this was true
> (you'll probably need to use 1.1.x for this as user weighting schemes
> are rather restricted in the statistics they can access in 1.0.x).
>
> But beware that this will probably give you noticeably worse search
> results. A better way to get rid of the unimportant documents would
> be to add a boolean term to the importnat ones and filter the results by
> this term.
>
> Cheers,
> Olly
>
More information about the Xapian-discuss
mailing list