[Xapian-discuss] Strange Weighting issue

John Wards jwards at whiteoctober.co.uk
Tue Sep 8 15:41:00 BST 2009


On Tue, Sep 8, 2009 at 3:05 PM, Richard Boulton<richard at tartarus.org> wrote:
> (I assume that $indexer in your code is an instance of
> Xapian::TermGenerator)

Yes sorry. But copy and paste-tastic.

> The default Xapian weighting formula has a term which reduces the weight of
> large documents; the theory being that a small document with relevant
> information in it is better than a large one, because the small one is
> likely to be more tightly focussed on the topic.  As you're seeing, this
> compensation can mean that the same document repeated 100 times is
> considered less good than a single repetition of the document.

Ah ha, this explains why the last time I did this it worked as the
documents were all very small. This time they are rather large.

> Instead, if you're using the 1.0 release series, I recommend using a single
> extra term, added to all the documents, but with a weight chosen according
> to the document's importance.  Using a single term shouldn't disrupt the
> document length so much, but should allow you to weight your searches
> appropriately (you'll need to include that term in all your queries, of
> course: combine your existing queries with the weight term using the
> AND_MAYBE operator).

Right I get the adding the weight as a term in the indexing part.

$doc->add_term("XWEIGHT".$this->getWeighting());

However I don't see how this would help boost the document in the
results? I have 10 document types which all have a different weight
set against them.

I need all types to return, but with those with a higher rank to be
given a boost. The idea is to boost product pages over news pages for
example, as even if the news page is textually more relevant the type
of content is actually what the user is really searching for.

Do I have to loop over my list of weights adding an AND_MABEY clause
for each term?

Regards
John



More information about the Xapian-discuss mailing list