[Xapian-discuss] Relevance wdf

James Aylett james-xapian at tartarus.org
Fri Mar 18 16:58:00 GMT 2005


On Fri, Mar 18, 2005 at 05:49:59PM +0100, roki roki wrote:

> > What are you actually trying to achieve? There may be other approaches
> > worth considering.
> 
> I want to give some terms in the html document more weight (eg <b></b> or
> <h></h>) then normal terms.

Right. But in that case you really /don't/ want the behaviour you
asked for. You just want to let BM25 do its job and give you a ranking
for each document. Otherwise you'll start getting strange effects both
with queries with more terms, and with documents where you have
upweighted a term in a document that isn't terribly important compared
to a more important document that has the term but not
upweighted. It's difficult to come up with a realistic example of this
happening, but I'd be uneasy at defeating BM25 in the way I suggested
for a normal search system.

An alternative might be for you to recompile the Xapian library with
different defaults for BM25, using the 0 values Richard
suggested. That gets a better way of achieving the same effect without
any risk of breaking BM25, without adding those extra terms, and
without improving the Perl bindings :-)

Unless you have a good reason to, I'd trust BM25 to do its job :-)

(It's possible that my suggestion will have almost the same effect as
Richard's, but I can never remember how pulling the algorithm around
actually works, and I don't have time to sit down with the formula and
work it out. I'd just not trust throwing high wdf but otherwise
meaningless terms to do something you can just turn off, even if it
involves a library recompile.)

James

-- 
/--------------------------------------------------------------------------\
  James Aylett                                                  xapian.org
  james at tartarus.org                               uncertaintydivision.org



More information about the Xapian-discuss mailing list