[Xapian-discuss] creating index for document values

Olly Betts olly at survex.com
Wed Aug 20 02:31:02 BST 2008


Please don't top post.

On Sat, Aug 16, 2008 at 09:36:25AM -0700, mark wrote:
> On Sat, Aug 16, 2008 at 9:08 AM, mark <markkicks at gmail.com> wrote:
> > i want to query a database and sort the results by a value ( which i
> > created like this doc.add_value(1, some_value)).
> > the default search is really fast, but now if i try to sort the
> > results by this value, it takes over 5 seconds on database with around
> > 2 million documents.
> > is there anyway i can speed up this? do i need to create an index on
> > the database on this value to make it fast?

Currently sort-by-value generally is significantly slower.  The main
issue is how values are currently stored.

You can help things out by only storing the values you really need
(they're all stored and so retrieved together) and by putting the values
you need to access most in lower numbered slots (since they're
serialised so this minimises the unpacking required).  The former
matters more than the latter.

I'm actually in the middle of changing how values are stored to address
this and similar issues.

> i found this in the docs page of sorting, and it says "arrange for
> documents to be indexed in date order"? now how do i do this?
> thanks again!
> 
> "http://xapian.org/docs/sorting.html
> If you want to offer a "sort by date" feature, and can arrange for
> documents to be indexed in date order (or a close-enough
> approximation), "

Which part is unclear?

For example, take your documents and sort them into date order, then
feed them into Xapian in that order.  Now "sort by date" is the same as
"sort by docid"...

Cheers,
    Olly



More information about the Xapian-discuss mailing list