Weighting recent results

Olly Betts olly at survex.com
Wed May 18 01:17:35 BST 2016


On Mon, May 16, 2016 at 12:35:53PM -0400, Alex Aminoff wrote:
> I was thinking about this some more: Is there a reason I can't just
> weight by some function of recency at indexing time?
> 
>  $weight = get_weight_based_on_recency(...);
>  $tg->index_text($txt,$weight);

The second parameter there is a WDF multiplier, which isn't really
"weight".  It depends on the weighting formula you're using (and the
parameters set for it), but simply scaling up the WDF values for a whole
document is likely to be counteracted by the corresponding increase in
the document length (since that is SUM(WDF)).  And the average document
length will be fairly meaningless, which will probably make the
relevance weighting less effective.

Also, recency changes with passing time, so you'll either have to
reindex regularly, or else $weight will have to keep increasing as
time passes.

So it seems a problematic approach to me.  I think you'd need to try it
to see if it can be made to work satisfactorily, and probably be
prepared to tweak the weighting scheme parameters.

Cheers,
    Olly



More information about the Xapian-discuss mailing list