[Xapian-discuss] Help with weights

Robert Kaye rob at eorbit.net
Wed Jul 2 00:15:53 BST 2008


Hi!

Everytime I think I've got the xapian search for MusicBrainz licked I  
ask for more feedback and my community finds yet another test case  
that throws a monkey-wrench into my project. And the more I try to  
understand Xapian's weighting system, the less I really understand it.

Let me ask a specific question -- in my release index (an index of CD  
titles, essentially) I have a field called type. When the value of  
this field is "album" I give it a termcount of 100. All other values  
for this field and all other fields get a termcount of 1.

For the enquire, I use a stock object. I do not define a weighting  
system, do not tinker with doc order or sort order. When I search for  
the term "love" in the release title (very common term), the top hits  
are the ones that contain the word "love" twice. Good.

But, for all the hits that have the word "love" in them once, I would  
expect to see the releases of type "album" to be near the top. But  
they are not:

http://musicbrainz.homeip.net/search/textsearch.html?query=love&handlearguments=1&limit=25&type=release&adv=0&offset=0

They make up the *bottom* 3-4 pages of the results, meaning they got  
ranked BELOW all the non-"album" values:

http://musicbrainz.homeip.net/search/textsearch.html?query=love&handlearguments=1&limit=25&type=release&adv=0&offset=250

I can clearly see that my weighting is having an effect, but its the  
opposite effect from what I am expecting.

What am I missing here? Any tips would be appreciated!

--

--ruaok      Somewhere in Texas a village is *still* missing its idiot.

Robert Kaye     --     rob at eorbit.net     --    http://mayhem-chaos.net




More information about the Xapian-discuss mailing list