[Xapian-discuss] weighting of documents and terms

Olly Betts olly at survex.com
Fri Jan 12 23:42:24 GMT 2007


On Fri, Dec 29, 2006 at 05:24:53PM +0100, Felix Antonius Wilhelm Ostmann wrote:
> i want to sort the results by my special weighting, i never use the by 
> relevance. but if i use sort_by_value this is realy slow :( 20 times 
> slower than by relevance.

Sort by value is slower because we need to read the values for candidate
documents.

Incidentally, I think this could be improved by storing values
differently.  Currently we store all the values for a document together
indexed keyed on the docid (one consequence is that storing extra values
you don't use slows down use of values).  I think it would be
significantly better overall to store a stream for each value number,
split into chunks (rather like we do already for posting lists).

> can i modify the relevance-sort so it works 
> fine for me? what must i do? The only way i see to modify is at 
> searchtime to set_weighting_scheme of the Enquire.

You don't seem to say what your "special weighting" is.

If it's a pre-calculated weight for each document, you could store it as
the wdf of a special extra term which indexes every document.  Then a
query Q becomes `X FILTER Q' and you can write a custom weighting scheme
which returns the weight stored in the wdf of the special term X.

It's not really how this was expected to be used, but it would do the
job.  In a way, it's a quick hack implementation of the different way
of storing values I describe above!

> the second problem ist, that i need a weight by value and term :-/ value 
> is perhaps 20 and this term has an weight of 2 so the value must be 40 
> for the sorting.

I'm not sure I follow.  Can you describe the situation in a bit more
detail?

Cheers,
    Olly



More information about the Xapian-discuss mailing list