[Xapian-discuss] How many docs to feed to an RSet?

Richard Boulton richard at lemurconsulting.com
Fri Feb 29 10:58:57 GMT 2008


Matthew Somerville wrote:
> Hi,
> 
> I'm just trying out get_eset() to fetch terms to go under a "possibly 
> relevant terms" or similar heading on my search results pages. When 
> compiling the RSet to feed to get_eset(), how many documents should I add? 
> As I've just fetched results for a search, do I feed in all the result 
> documents (a default of 20 on a page), or less?

The best answer is to play around, experiment, and see what seems to 
work for you.  The dataset, and the types of queries you're doing, will 
both have a big effect.

I've found that a value of 10 works well with some datasets - but you 
may find that it gives terrible results.

What you actually want is to only feed documents which are really 
relevant to the RSet.  One approach for doing this is to ask the user; 
but this often isn't possible.  Another approach is to try and make use 
of log information for previous searchers in some way (but Xapian 
provides no support for this, of course).

 > The code calls
> get_mset(0,500) in order to have exact "number of results" for less than 500 
> results, so could conceivably feed up to 500 in, though I'm guessing that's 
> not that helpful/fast.

Probably not, indeed.  If you supply too many documents, you're likely 
to get lots of irrelevant terms being thrown up in this situation.

Incidentally, if you're just passing 500 to get an accurate result 
count, you might want to try using the "checkatleast" parameter for 
that, instead.  eg: get_mset(0,20,500).

-- 
Richard



More information about the Xapian-discuss mailing list