sorting large msets

Eric Wong e at 80x24.org
Fri Apr 6 20:24:23 BST 2018


> > Olly Betts <olly at survex.com> wrote:
> > > 
> > > The reverse order (ENQ_ASCENDING) is really fast - about 0.0001 seconds.
> > > This is because in that case we can just stop once we've found 200
> > > matches.

With a few million documents, that ENQ_ASCENDING sounds promising :)

So, it looks like if I had ideal ordering, I could do something
along the lines of:

	my $doc_id = $db->get_metadata('last_doc_id') || 0xffffffff;

	$db->replace_document($doc_id--, $_) foreach (@doc);

	$db->set_metadata('last_doc_id', $doc_id);

And get killer performance.

Olly Betts <olly at survex.com> wrote:
> On Sat, Mar 31, 2018 at 12:58:19AM +0000, Eric Wong wrote:
> > Would it be possible to teach Xapian to optimize its storage for
> > certain queries so it can stop once it's found 200 matches?
> > From what I recall, SQL implementations are pretty good at that.
> 
> Probably - e.g. tracking some sort of buckets for values would help here
> as well as for some other uses of values.

Alright.  I'll keep an eye out for that in coming years.

I've moved some queries (OVER/XOVER) to SQLite which is already
a dependency of public-inbox (sometimes it needs to sort by
article number, sometimes it sorts by timestamp).



More information about the Xapian-discuss mailing list