[Xapian-discuss] Indexing speed benchmark - Xapian, Solr

Michel Pelletier pelletier.michel at gmail.com
Fri Apr 17 22:39:44 BST 2009


Without being able to look at the code this person wrote to reproduce
the benchmark then it's difficult for us to say.  Recently I was bulk
indexing into Xapian and ran out of memory.  This was not xapian's
fault, i had an obvious and stupid bug in my code preventing python's
garbage collector from collecting already indexed objects.  This
author may well have run into a similar problem without knowing it.
Or done something clearly inefficient, like flushing after every
single added document.  Without code, we'll never know.

On my year old macbook pro laptop, I can bulk index about 90
employment description documents ("job ads") per second, taking about
280 seconds to index 25 thousand documents.  These document are coming
out of a relational database and into xapian.    Those 25K documents,
which include many terms and values and full document data, take up
about 52Mb of disk.  During the import, the resident process memory of
the import script never goes over 60MB.

I agree with the poster that searching xapian is very fast. :)

-Mike

On Sun, Apr 12, 2009 at 7:26 AM, Andy <angelflow at yahoo.com> wrote:
>
> I came across this benchmark between Xapian & Solr:
>
> http://www.anur.ag/blog/2009/03/xapian-and-solr/
>
> According to the benchmark, a doc set that took Solr 34 min to index took Xapian 7 hours. Solr's index is also much smaller - 2.5GB to Xapian's 8.9GB.
>
> I'm new to Xapian. Just wondering if results like these are typical? Is indexing speed & size a known issue in Xapian? Or is there some other explanation for the big difference between the Solr & Xapian results?
>
>
>
>
>
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss
>



More information about the Xapian-discuss mailing list