[Xapian-discuss] about index speed of xapian

Olly Betts olly at survex.com
Mon Nov 26 22:00:37 GMT 2012


On Wed, Nov 21, 2012 at 05:46:26PM +0800, superthread wrote:
> i use xapian to index a txt file, it's size is 268M. i take each line
> as a document, and each line has two field like 13445511 | 111115151.
> the recored size is 10000000. the XAPIAN_FLUSH_THRESHOLD set 1000000.

How did you pick that XAPIAN_FLUSH_THRESHOLD setting?  It could be
it's not as high as you could set it, or it could be it's high enough
that you're creating VM pressure and a lower setting would actually
be faster.

Also, what version of Xapian are you using, and with which database
backend?  One of the changes in brass over chert is:

  + Batched posting list changes during indexing use significantly less
    memory.

So using brass should at least allow you to set XAPIAN_FLUSH_THRESHOLD
higher, and the reduced memory usage might make it faster even for the
same setting.

These are very small documents, which isn't a case I think anyone has
looked at closely, so it would be interesting to profile it.  There
are some tips here:

http://trac.xapian.org/wiki/ProfilingXapian

Cheers,
    Olly



More information about the Xapian-discuss mailing list