[Xapian-discuss] what is the fastest way to fetch results which are sorted by timestamp ?
Henry C.
henka at cityweb.co.za
Thu Aug 11 15:13:47 BST 2011
On Thu, August 11, 2011 13:17, Richard Boulton wrote:
>> In our case we're indexing hundreds of millions of docs to several hundred
>> thousand sub-indexes, then merging those to hundreds of indexes which are
>> searched against. Â Keeping all that sorted (by custom "PageRank" value)
>> during index-time to maximise sorted-result performance is nigh on
>> impossible.
>
> Indeed. What might be more possible would be to partition the
> documents by pagerank value into several shards; so that low values go into one
> bucket, middle into another, high into another still. Then, it should be
> possible to merge the databases keeping these shards together and in
> decreasing weight order (by assigning document IDs in particular ranges to
> each shard, and using the option of xapian-compact that preserves docids), and
> to keep track of the maximum value. You can then use a custom PostingSource
> to take advantage of the knowledge of the maximum docids in each range of
> document IDS in the output database, allowing the search to terminate early in
> many cases.
>
> Sorry, that's not a very coherent explanation - it's another technique
> that I've experimented in the past with, with some success, but it's again
> quite tricky to implement.
hmm, now you've gone and done it: you've turned my minor itch into a raging
rash.
Thank you for that thoughtful reply, it's given me ideas to play with -- given
time (sigh).
h
More information about the Xapian-discuss
mailing list