sorting large msets

Eric Wong e at 80x24.org
Sat Mar 31 01:58:19 BST 2018


Olly Betts <olly at survex.com> wrote:
> On Fri, Mar 30, 2018 at 05:21:43PM +0000, Eric Wong wrote:
> > Hello, is there a way to optimize sorting by certain values
> > for queries which return a huge amount of results?
> [...]
> > $enquire->set_sort_by_value_then_relevance(0, 1);
> 
> If you're just wanting the 200 newest, it'll be faster not to calculate
> weights, so:
> 
> $enquire->set_sort_by_value(0, 1);
> $enquire->set_weighting_scheme(new Xapian::BoolWeight());
> 
> For me, this drops the time from ~0.075 seconds to ~0.067 seconds (with
> xapian-core 1.4.5).

Thanks, I can see how that helps.

> But even 0.075 seconds doesn't really seem "slow" to me.  What times
> are you seeing?  If it's much slower, I'd make sure you're at least
> using the latest 1.4.x release.

Roughly what you saw with $n = 100 (the default in my sample
script).  The problem is time increases with DB size.  Setting
$n to 1000 makes it roughly 0.750s.

> If you do want faster, the simplest solution is to arrange that the
> document id order matches the document age order, and then you can
> specify to just sort by that:
> 
> $enquire->set_weighting_scheme(new Xapian::BoolWeight());
> $enquire->set_docid_order(Search::Xapian::ENQ_DESCENDING);

That would be tricky with emails being delivered out-of-order;
not to mention old archives being imported + indexed.

> That's more like 0.053 seconds for 1.4.5 and 0.021 seconds for git
> master with glass.
> 
> The reverse order (ENQ_ASCENDING) is really fast - about 0.0001 seconds.
> This is because in that case we can just stop once we've found 200
> matches.

So that sounds like it's O(1) and independent of how many
documents are in the mset?

Would it be possible to teach Xapian to optimize its storage for
certain queries so it can stop once it's found 200 matches?
>From what I recall, SQL implementations are pretty good at that.



More information about the Xapian-discuss mailing list