sorting large msets

Olly Betts olly at survex.com
Mon Apr 9 07:18:37 BST 2018


On Fri, Apr 06, 2018 at 07:24:23PM +0000, Eric Wong wrote:
> > > Olly Betts <olly at survex.com> wrote:
> > > > 
> > > > The reverse order (ENQ_ASCENDING) is really fast - about 0.0001 seconds.
> > > > This is because in that case we can just stop once we've found 200
> > > > matches.
> 
> With a few million documents, that ENQ_ASCENDING sounds promising :)
>
> So, it looks like if I had ideal ordering, I could do something
> along the lines of:
> 
> 	my $doc_id = $db->get_metadata('last_doc_id') || 0xffffffff;
> 
> 	$db->replace_document($doc_id--, $_) foreach (@doc);
> 
> 	$db->set_metadata('last_doc_id', $doc_id);
> 
> And get killer performance.

Yes, though that's likely to be slower to index than this, since
appending a document is handled more efficiently:

	$db->add_document($_) foreach (reverse @doc);

Cheers,
    Olly



More information about the Xapian-discuss mailing list