[Xapian-discuss] what is the fastest way to fetch results which are sorted by timestamp ?

Henry C. henka at cityweb.co.za
Thu Aug 11 15:13:47 BST 2011


On Thu, August 11, 2011 13:17, Richard Boulton wrote:
>> In our case we're indexing hundreds of millions of docs to several hundred
>> thousand sub-indexes, then merging those to hundreds of indexes which are
>> searched against.  Keeping all that sorted (by custom "PageRank" value)
>> during index-time to maximise sorted-result performance is nigh on
>> impossible.
>
> Indeed.  What might be more possible would be to partition the
> documents by pagerank value into several shards; so that low values go into one
> bucket, middle into another, high into another still.  Then, it should be
> possible to merge the databases keeping these shards together and in
> decreasing weight order (by assigning document IDs in particular ranges to
> each shard, and using the option of xapian-compact that preserves docids), and
> to keep track of the maximum value.  You can then use a custom PostingSource
> to take advantage of the knowledge of the maximum docids in each range of
> document IDS in the output database, allowing the search to terminate early in
> many cases.
>
> Sorry, that's not a very coherent explanation - it's another technique
> that I've experimented in the past with, with some success, but it's again
> quite tricky to implement.

hmm, now you've gone and done it:  you've turned my minor itch into a raging
rash.

Thank you for that thoughtful reply, it's given me ideas to play with -- given
time (sigh).

h




More information about the Xapian-discuss mailing list