[Xapian-discuss] what is the fastest way to fetch results which are sorted by timestamp ?
Tim Brody
tdb2 at ecs.soton.ac.uk
Thu Aug 11 16:29:12 BST 2011
On Thu, 2011-08-11 at 12:17 +0100, Richard Boulton wrote:
> On 11 August 2011 11:18, Henry C. <henka at cityweb.co.za> wrote:
> > It's a real pity xapian-compact doesn't have a --sort-by-value argument to
> > perform post-indexing basic sorting of some kind.
> >
> > Is something like this even possible (by that I mean a change to
> > xapian-compact code)?
>
> It's not really possibly to do sorting (or other reordering of docids)
> during the process that xapian-compact performs; it's working at a
> lower level than that, stitching chunks of postlists together without
> actually interpreting their contents.
> Other than that being implemented, to sort the database you really
> need to work at pretty much the level of the xapian database API; ie,
> implement something more like the copydatabase tool, which copies
> documents in the new order. I've written code (in python) to sort
> databases using this method in the past - which worked ok for a few
> million documents, but isn't particularly efficient. I don't have the
> rights to distribute that code, but it was pretty simple. If I
> remember correctly, it pulled the values to sort by into a numpy
> array, and used one of numpy's functions to produce a mapping from the
> old docid to the new docid, and then just ran through the old database
> reading documents and writing them to the new database in the correct
> position.
Out of curiosity, if you left a gap between every docid will Xapian
maintain an efficient index if you re-insert documents?
e.g.
a. 10 - 2010-05-01
b. 20 - 2010-06-01
c. 30 - 2010-07-01
Then at a later date you re-index c. as 2010-05-15 by giving it an
intermediate docid:
a. 10 - 2010-05-01
c. 15 - 2010-05-15
b. 30 - 2010-06-01
So maintaining a sorted index becomes an exercise in defragmenting
rather than building an entire new DB whenever a document's ranking
increases?
An annoying problem only solvable by massive duplication (although my
inexpert view would be ASC/DESC should be doable on a single index???).
Cheers,
Tim.
More information about the Xapian-discuss
mailing list