[Xapian-discuss] Always returning ALL the documents matching a query

tata 668 tata668 at gmail.com
Sat Jan 10 13:03:59 GMT 2009


Thanks for the help Olly!

I think I'll continue to use Xapian the way I currently do but I'll investigate it more in depth 
when my project will be almost done.

Julien




Olly Betts wrote:
> On Mon, Dec 29, 2008 at 02:26:37PM -0500, tata 668 wrote:
>> If I want to manage the sorting outside Xapian, I have to get all the
>> posts resulting of the search in the first place.
> 
> The essential problem with getting all the results is that doing so is
> inherently at best O(total_number_of_results) in both time and memory.
> And that can easily end up being O(total_number_of_documents) - e.g.
> searching for "the" if the documents are in English.  In fact the time
> complexity will likely be O(N*log(N)) because we always perform a sort
> currently (this could potentially be avoided in some cases when the
> results are being ordered only by document id since the posting lists
> are sorted by document id).
> 
> If you work with the features Xapian provides and only request the
> number of documents you actually want to display then the matcher only
> needs to keep the best n documents seen so far (where n = mset_size +
> offset) which is a lot less memory for most queries.  Also various
> optimisation tricks which the matcher knows allow us to terminate early
> in many cases, so the time required is often much less (exactly how well
> these work depends on the query, the various options used, and the
> nature of the data).
> 
> So using Xapian as a simple "source of document ids" is really making it
> operate with its hands tied behind its back.  It'll do what you ask
> (unless you run out of memory), but you would do better to use the
> sorting and filtering facilities provided rather than ignoring them and
> implementing your own externally.
> 
> If you don't have many documents, you can probably get away with not
> worrying about efficiency, especially if you don't have a high query
> rate either.  But if you're handling large numbers of documents, you may
> run into problems, and small systems have a habit of growing...
> 
> Cheers,
>     Olly
> 



More information about the Xapian-discuss mailing list