[Xapian-discuss] Need more explanations about Xapian's expanding
Morningrise
ivansutter at gmail.com
Tue Sep 30 09:47:09 BST 2008
Olly Betts wrote:
>
> Well, "40" is just the size of the MSet you've requested.
>
> "20" is how many documents from the MSet you are adding to the RSet, and
> "5" is how many the example you start from added.
>
> I suspect 20 is too many - you want the RSet to contain genuinely
> relevant documents. Ideally the user would pick the relevant documents,
> but you can often get reasonable results by assuming that the top few
> entries from the MSet are relevant. But the more you add, the more
> likely that some won't actually be relevant - I would guess that 20 is
> too high, especially if you are often getting less than 40 results in
> total.
>
> You could probably look at how the MSet weights vary to pick a cut-off
> dynamically. I've not done tests, but it seems likely that you don't
> want to keep adding documents once the weights drop sharply.
>
Ok I think I get it. I'll try with more tests.
Your last tip with the weights might be a good idea !
Olly Betts wrote:
>
> I wonder if you meant "10" not "5"? "10" is the number of terms you'd
> like in the ESet.
>
Yes indeed ! Sorry ..
Olly Betts wrote:
>
> I'm not sure "science" can automatically give you good values for the
> number of documents to add to the auto-generated RSet and the number of
> relevant terms to ask for. You probably do want to run some tests to
> empirically validate the numbers you're using.
>
> Cheers,
> Olly
>
Ok. I just would like to know if there are "standards" (I don't know but
maybe 500 items in the MSet with a big database and only 10 for a small one
...) but you've answered well for me.
Thanks a lot for this clarification !
--
View this message in context: http://www.nabble.com/Need-more-explanations-about-Xapian%27s-expanding-tp19691711p19738320.html
Sent from the Xapian - Discuss mailing list archive at Nabble.com.
More information about the Xapian-discuss
mailing list