sorting large msets

Olly Betts olly at survex.com
Sat Mar 31 00:51:35 BST 2018


On Fri, Mar 30, 2018 at 05:21:43PM +0000, Eric Wong wrote:
> Hello, is there a way to optimize sorting by certain values
> for queries which return a huge amount of results?
[...]
> $enquire->set_sort_by_value_then_relevance(0, 1);

If you're just wanting the 200 newest, it'll be faster not to calculate
weights, so:

$enquire->set_sort_by_value(0, 1);
$enquire->set_weighting_scheme(new Xapian::BoolWeight());

For me, this drops the time from ~0.075 seconds to ~0.067 seconds (with
xapian-core 1.4.5).

If I use xapian git master (still using the glass backend) then it's
~0.051 seconds with weights and ~0.045 seconds without.

If I use the new (but still in development) honey backend it's ~0.049
and ~0.044 seconds.

But even 0.075 seconds doesn't really seem "slow" to me.  What times
are you seeing?  If it's much slower, I'd make sure you're at least
using the latest 1.4.x release.

If you do want faster, the simplest solution is to arrange that the
document id order matches the document age order, and then you can
specify to just sort by that:

$enquire->set_weighting_scheme(new Xapian::BoolWeight());
$enquire->set_docid_order(Search::Xapian::ENQ_DESCENDING);

That's more like 0.053 seconds for 1.4.5 and 0.021 seconds for git
master with glass.

The reverse order (ENQ_ASCENDING) is really fast - about 0.0001 seconds.
This is because in that case we can just stop once we've found 200
matches.

Cheers,
    Olly



More information about the Xapian-discuss mailing list