[Xapian-discuss] Need more explanations about Xapian's expanding

Morningrise ivansutter at gmail.com
Tue Sep 30 09:47:09 BST 2008



Olly Betts wrote:
> 
> Well, "40" is just the size of the MSet you've requested.
> 
> "20" is how many documents from the MSet you are adding to the RSet, and
> "5" is how many the example you start from added.
> 
> I suspect 20 is too many - you want the RSet to contain genuinely
> relevant documents.  Ideally the user would pick the relevant documents,
> but you can often get reasonable results by assuming that the top few
> entries from the MSet are relevant.  But the more you add, the more
> likely that some won't actually be relevant - I would guess that 20 is
> too high, especially if you are often getting less than 40 results in
> total.
> 
> You could probably look at how the MSet weights vary to pick a cut-off
> dynamically.  I've not done tests, but it seems likely that you don't
> want to keep adding documents once the weights drop sharply.
> 

Ok I think I get it. I'll try with more tests.
Your last tip with the weights might be a good idea !


Olly Betts wrote:
> 
> I wonder if you meant "10" not "5"?  "10" is the number of terms you'd
> like in the ESet.
> 
Yes indeed ! Sorry ..


Olly Betts wrote:
> 
> I'm not sure "science" can automatically give you good values for the
> number of documents to add to the auto-generated RSet and the number of
> relevant terms to ask for.  You probably do want to run some tests to
> empirically validate the numbers you're using.
> 
> Cheers,
>     Olly
> 
Ok. I just would like to know if there are "standards" (I don't know but
maybe 500 items in the MSet with a big database and only 10 for a small one
...) but you've answered well for me.

Thanks a lot for this clarification !
-- 
View this message in context: http://www.nabble.com/Need-more-explanations-about-Xapian%27s-expanding-tp19691711p19738320.html
Sent from the Xapian - Discuss mailing list archive at Nabble.com.




More information about the Xapian-discuss mailing list