[Xapian-discuss] Optimization and Load balancing with Xapian

Olly Betts olly at survex.com
Thu Feb 16 21:48:38 GMT 2006

On Thu, Feb 16, 2006 at 05:38:38PM +0200, David Levy wrote:
> Also I ask the 5 first hits in the omega request  (HITSPERPAGE parameter, is
> it the better way ?)

No, that's the way to specify that.

> > It's not the actual sorting which takes the extra time - the issue is
> > that for a multi-term query, relevance ranking can terminate early in
> > many cases (often when we reach the end of the matches for any of the
> > terms).  But if results are sorted on a value, we need to consider every
> > result which matches the query.
> so you are telling me I won't be able to improve my calculation time if I
> still use sorting ...?

You can try all the usual things to speed up searches - lots of RAM,
fast disks, compact the database, etc.  Using flint instead of quartz
may help too.  Some of the changes I have planned for flint will
hopefully make a significant difference too - the way values are
currently stored doesn't lead itself to fast access in this case.

But sorting as currently designed does need to process every matching
document, which is going to be slow for a large database if the query
matches a lot of documents.

> Is there any other way to get results sorted by another criteria than
> relevance ?

If you have only one sort order, and can arrange to add documents in
that order, then you can just use the raw document order for your
sorted search.  This works particularly well for date ordering, since
newly arrived documents end up in the right place.  That's how the
Gmane search implements sort-by-date.

Actually, an interesting thing to note is that "sort by reverse date"
can terminate early, while "sort by date" has to scan the whole docid
range (I plan to allow running postlists backwards which will make
"sort by date" as fast as "sort by reverse date" but I've not
implemented that yet).

But even now, "sort by date" is still acceptably fast on 30 million
documents, which points the finger strongly towards accessing the values
as taking most of the time.


More information about the Xapian-discuss mailing list