[Xapian-discuss] what is the fastest way to fetch results which are sorted by timestamp ?

Henry C. henka at cityweb.co.za
Thu Aug 11 11:18:02 BST 2011


On Tue, August 9, 2011 19:04, Richard Boulton wrote:
> Sorting the database, or some variant of that, is the way to get
> really fast sorted results.
>
> There's a variation I experimented with using Xappy, involving sorting
> as much of the database as possible, keeping track of the range of document IDs
> for which the values were sorted, and using a custom PostingSource to take
> advantage of that knowledge to skip past the document IDs which were known to
> be at too low a value.  This worked pretty well (not quite as fast as using a
> fully sorted database), but is quite fiddly to maintain the ordering (and you
> need to use a custom PostingSource, so if you're using one of the language
> bindings, you'd need to compile your own custom Xapian).

It's a real pity xapian-compact doesn't have a --sort-by-value argument to
perform post-indexing basic sorting of some kind.

Is something like this even possible (by that I mean a change to
xapian-compact code)?

In our case we're indexing hundreds of millions of docs to several hundred
thousand sub-indexes, then merging those to hundreds of indexes which are
searched against.  Keeping all that sorted (by custom "PageRank" value) during
index-time to maximise sorted-result performance is nigh on impossible.




More information about the Xapian-discuss mailing list