[Xapian-devel] adaptive query scoring
Olly Betts
olly at survex.com
Wed May 17 08:36:22 BST 2006
On Tue, May 16, 2006 at 09:29:12AM -0700, Alexander Lind wrote:
> > There certainly are words which
> > have many meanings where there's less correlation (e.g. 'stock market'
> > vs 'vegetable stock') and even word order can make a big difference
> > (e.g. 'oil bath' vs 'bath oil'). But for the 'stock' example, a query
> > for just 'stock' could useful promote results from both, and a query
> > for 'stock market' would have 'market' in too, so although the cookery
> > pages would get a boost, the financial pages would get larger one.
>
> Yeah there must be tons of word pairs out there that would benefit from
> some sort of 'mutual' scheme, but then there are probably a great deal
> that would suffer from them too. Especially in our data set.
There are many such tradeoffs when building a search system - you
improve results for some queries, but the same change also gives worse
results for others. And you can always find a bad example if you look
hard enough, so it's unwise to base a decision on a single
"counter-example".
I'm not saying you're wrong, just pointing out that this it isn't
easy to get right.
> I can see that your theory here of favoring all results that gets
> clicked regardless of the query can work for a regular web search, but I
> don't think it will pan out as well for us, who have specific products
> as results.
I think you're right.
> I'd be happy to contribute code back to the Xapian project if you think
> there is any use for it. I can only offer php code, but for example I
> have two classes, one indexing and one search class, which may be
> suitable for php-examples for other php:ers to look at. They use many
> more of the xapian features than the present examples do.
More sample code would definitely be useful. I've created a wiki page
to house such links:
http://wiki.xapian.org/SampleCode
It might be good to have features in the library to support this sort
of use better, but I'm not really sure what they'd look like other than
we could handle modifying an existing document more efficiently which
would be handy when you want to add terms for the clicks.
Currently we just rewrite all the postings, but we don't actually need
to touch those terms which haven't changed - it's not been optimised
yet because it's not a case which usually matters.
Cheers,
Olly
More information about the Xapian-devel
mailing list