[Xapian-discuss] Optimization and Load balancing with Xapian

David Levy dvid.levy at gmail.com
Wed Feb 22 14:59:20 GMT 2006


I have tested my database using temporary filesystem /dev/shm, which can be
compared to RAMDISK.

Performances seems really a lot better ... as long as I don't use sorting !
With sorting, even with this filesystem, I can have >10seconds search time.
I am so disappointed with this issue.
Anyone having the same needs as me ?

If it's not the hardware, so that's the software, or configuration maybe.

Do you know if Lucene uses the same mecanism to sort results ?

Regards


On 2/20/06, Olly Betts <olly at survex.com> wrote:
>
> On Mon, Feb 20, 2006 at 12:05:23PM +0200, David Levy wrote:
> > > But sorting as currently designed does need to process every matching
> > > document, which is going to be slow for a large database if the query
> > > matches a lot of documents.
> >
> > Will this mecanism change in future releases ?
>
> It's possible there's a better way to handle it.  If we came up with a
> workable scheme and somebody implemented it then we'd have a different
> mechanism.  So it might change, but it's not something I'm currently
> working on or actively planning to.
>
> The problem is that you really want to process the documents in sorted
> order, as you can then just stop once you've filled the MSet.  You could
> list the document ids in ranked order for each sortable value (it would
> take a fair amount of space), but then all the posting lists
> list documents in id order, so you can't easily process documents in
> sorted order even though you would then know that order.  You could
> try to visit the docids in the order by random-access like seeking
> into posting lists.  That would work OK if the top N items all made
> it into the MSet, but at some point it'll become less efficient...
>
> But it looks like this isn't currently the bottleneck.
>
> > I have compacted and removed large fields in the index. So the database
> is
> > half the size ... but performance are still slow.
> > I am thinking about using "ramdisks" maybe; and I am checking my hard
> disks
> > too.
> > Did you used ramdisks with Xapian yet ? Does it help ?
>
> The VM system in a modern Unix-like OS will cache blocks recently read
> from disk.  This dynamic caching is probably going to do as well as
> trying to force parts of the database into RAM.  By all means give it
> a try, but I doubt it's a magic bullet.
>
> > > But even now, "sort by date" is still acceptably fast on 30 million
> > > documents, which points the finger strongly towards accessing the
> values
> > > as taking most of the time.
> >
> > How was do you mean ?
> > I was bad results with < 1M documents  :
>
> I mean "sort by date" is acceptably fast *on gmane*, which doesn't use
> sorting on values, but still has to trawl through the whole of each
> posting list in this case.  That strongly suggests that the bottleneck
> is currently with getting at the values to do the sorting.
>
> > However, I used the "collapse" parameter .. Is it time consuming even it
> > there are no records to collapse in the results ?
>
> Collapsing still needs to read the values, even if they are unique.  So
> if collapsing is also slow, that further points the finger at the
> storage of the values.
>
> Cheers,
>     Olly
>



--
David LEVY {selenium}
Website ~ http://www.davidlevy.org
Wishlist Zlio ~ http://david.zlio.com/wishlist
Blog ~ http://selenium.blogspot.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tartarus.org/pipermail/xapian-discuss/attachments/20060222/4b86a996/attachment.htm


More information about the Xapian-discuss mailing list