[Xapian-discuss] How to deal with "zones" on documents

Olly Betts olly at survex.com
Sun Oct 30 11:45:46 GMT 2005


On Sat, Oct 29, 2005 at 12:40:40PM +0200, Gilles Polart-Donat wrote:
> I want to have a different weight on result for a term, if it comes from 
> differents parts of a document.
> 
> For example, on HTML files, from tags <title>, <h1>, <h2>, ...

Assuming you know the relative weights you want to apply to the
different tags, then just apply extra wdf to terms generated from these
fields.  You do this by passing a value greater than 1 for the optional
third argument to Document::add_posting() (or the optional second
argument to Docuemnt::add_term()).

E.g.

    doc.add_posting(term, pos, wdfinc);

    doc.add_term(term, wdfinc);

So "wdfinc" is the "extra weight factor".  Note that it must be an
integer.

If you want to be able to tune the factors dynamically, you could
index terms from particular tags with particular prefixes (e.g.
the term "xapian" in <h1> might be XH1:xapian).  Then for each
term in the query, you'd produced an OR of the various possible
forms with the wqf (within query frequency) set appropriately
for each form:

(xapian OR XH1:xapian OR XH2:xapian OR ...)

The static way is likely to be more efficient, so use that if you
can.

Cheers,
    Olly



More information about the Xapian-discuss mailing list