[Xapian-discuss] about index speed of xapian
Olly Betts
olly at survex.com
Mon Nov 26 22:00:37 GMT 2012
On Wed, Nov 21, 2012 at 05:46:26PM +0800, superthread wrote:
> i use xapian to index a txt file, it's size is 268M. i take each line
> as a document, and each line has two field like 13445511 | 111115151.
> the recored size is 10000000. the XAPIAN_FLUSH_THRESHOLD set 1000000.
How did you pick that XAPIAN_FLUSH_THRESHOLD setting? It could be
it's not as high as you could set it, or it could be it's high enough
that you're creating VM pressure and a lower setting would actually
be faster.
Also, what version of Xapian are you using, and with which database
backend? One of the changes in brass over chert is:
+ Batched posting list changes during indexing use significantly less
memory.
So using brass should at least allow you to set XAPIAN_FLUSH_THRESHOLD
higher, and the reduced memory usage might make it faster even for the
same setting.
These are very small documents, which isn't a case I think anyone has
looked at closely, so it would be interesting to profile it. There
are some tips here:
http://trac.xapian.org/wiki/ProfilingXapian
Cheers,
Olly
More information about the Xapian-discuss
mailing list