[Xapian-discuss] How many docs to feed to an RSet?
Richard Boulton
richard at lemurconsulting.com
Fri Feb 29 10:58:57 GMT 2008
Matthew Somerville wrote:
> Hi,
>
> I'm just trying out get_eset() to fetch terms to go under a "possibly
> relevant terms" or similar heading on my search results pages. When
> compiling the RSet to feed to get_eset(), how many documents should I add?
> As I've just fetched results for a search, do I feed in all the result
> documents (a default of 20 on a page), or less?
The best answer is to play around, experiment, and see what seems to
work for you. The dataset, and the types of queries you're doing, will
both have a big effect.
I've found that a value of 10 works well with some datasets - but you
may find that it gives terrible results.
What you actually want is to only feed documents which are really
relevant to the RSet. One approach for doing this is to ask the user;
but this often isn't possible. Another approach is to try and make use
of log information for previous searchers in some way (but Xapian
provides no support for this, of course).
> The code calls
> get_mset(0,500) in order to have exact "number of results" for less than 500
> results, so could conceivably feed up to 500 in, though I'm guessing that's
> not that helpful/fast.
Probably not, indeed. If you supply too many documents, you're likely
to get lots of irrelevant terms being thrown up in this situation.
Incidentally, if you're just passing 500 to get an accurate result
count, you might want to try using the "checkatleast" parameter for
that, instead. eg: get_mset(0,20,500).
--
Richard
More information about the Xapian-discuss
mailing list