[Xapian-discuss] what is the fastest way to fetch results which are sorted by timestamp ?

Richard Boulton richard at tartarus.org
Tue Aug 9 18:04:34 BST 2011


On 9 August 2011 17:48, makao009 <makao009 at 126.com> wrote:
> what is the fastest way to fetch results which are sorted by timestamp ?

The fastest possible way is to have your index sorted by timestamp
(ie, such that document IDs increase as the timestamp increases).
That way, the search can stop as soon as sufficient matches have been
found.  It can be very awkward to get an index in such order though,
particularly in the face of updates, assuming that you want the sort
order to show most recent first.

> i want to use xapian as my search engine , use add_boolean_term(something) and add_value(0,sortable_serialise(get_timestamp())) to a doc.
> search through enquire.set_weighting_scheme(xapian.BoolWeight()) and enquire.set_sort_by_value(0,True) to ensure that the results are sorted by the timestamp.

That's another approach, certainly.

> This method is ok , but is there a faster way to do that ? Since i have millions of records .

Sorting the database, or some variant of that, is the way to get
really fast sorted results.

There's a variation I experimented with using Xappy, involving sorting
as much of the database as possible, keeping track of the range of
document IDs for which the values were sorted, and using a custom
PostingSource to take advantage of that knowledge to skip past the
document IDs which were known to be at too low a value.  This worked
pretty well (not quite as fast as using a fully sorted database), but
is quite fiddly to maintain the ordering (and you need to use a custom
PostingSource, so if you're using one of the language bindings, you'd
need to compile your own custom Xapian).

-- 
Richard



More information about the Xapian-discuss mailing list