[Xapian-discuss] Need more explanations about Xapian's expanding

Ivan Sutter ivansutter at gmail.com
Fri Sep 26 17:11:40 BST 2008


I there,

I'm using Xapian with a database containing movies, TV shows etc (about
35000) and actors (about 35000 too).
The indexing and the basic search process well, but the results with the
expanding is not very relevant.

I'm drawing inspiration from simpleexpand.php5, and I am trying to play with
the values needed by get_mset() and get_eset() to find best suggesting
result.

First, I get the top 40 results :
$matches = $enquire->get_mset(0, 40, $rset);
That works well, even if 40 is often enough (I mean I often get less that 40
results).

Then, I call this part (sorry for the stupid copy-paste) :
// If no relevant docids were given, invent an RSet containing the top 5
// matches (or all the matches if there are less than 5).
   if ($rset->is_empty()) {
    $c = 20; // so here I've put 20 instead of 5...
    $i = $matches->begin();
    while ($c-- && !$i->equals($matches->end())) {
        $rset->add_document($i->get_docid());
        $i->next();
    }
   }
And in fact that's weird because my $rset is empty but it's called in the
previous get_mset() ! I've missed something.

Finally, I'm getting the suggestions :
$eset = $enquire->get_eset(10, $rset);


As you can see, I'm not mastering all these lines ... I just wish some help
to know how these "ratios" (the 40, 20 and 5) are affecting the result.
Don't worry, I've run tests, but according to the amount of data, it's hard
to know if I've find a true good result or if it's just luck !
So a "scientific" explanation would be grate !

Thanks in advance.


More information about the Xapian-discuss mailing list