sorting large msets
Eric Wong
e at 80x24.org
Fri Apr 6 20:24:23 BST 2018
> > Olly Betts <olly at survex.com> wrote:
> > >
> > > The reverse order (ENQ_ASCENDING) is really fast - about 0.0001 seconds.
> > > This is because in that case we can just stop once we've found 200
> > > matches.
With a few million documents, that ENQ_ASCENDING sounds promising :)
So, it looks like if I had ideal ordering, I could do something
along the lines of:
my $doc_id = $db->get_metadata('last_doc_id') || 0xffffffff;
$db->replace_document($doc_id--, $_) foreach (@doc);
$db->set_metadata('last_doc_id', $doc_id);
And get killer performance.
Olly Betts <olly at survex.com> wrote:
> On Sat, Mar 31, 2018 at 12:58:19AM +0000, Eric Wong wrote:
> > Would it be possible to teach Xapian to optimize its storage for
> > certain queries so it can stop once it's found 200 matches?
> > From what I recall, SQL implementations are pretty good at that.
>
> Probably - e.g. tracking some sort of buckets for values would help here
> as well as for some other uses of values.
Alright. I'll keep an eye out for that in coming years.
I've moved some queries (OVER/XOVER) to SQLite which is already
a dependency of public-inbox (sometimes it needs to sort by
article number, sometimes it sorts by timestamp).
More information about the Xapian-discuss
mailing list