[Xapian-discuss] Flint, UTF-8 and "large" documents
John Wang
johncwang at gmail.com
Tue Oct 3 16:49:55 BST 2006
I'm currently using the 0.9.6 svn 7230 UTF-8 snapshot tarball with a Flint
backend and the Perl bindings.
When I load a certain collection, the above configuration will create an
index that seems corrupted when I go to open it. I can't find an indication
of anything going wrong while I'm building the index. When I go to open it
for reading immediately after building, I get the following:
*** glibc detected *** free(): invalid pointer: 0x0acb6ab0 ***
Aborted
This happens when I flush the db once after loading all the documents in the
collection. If I periodically flush while I'm loading, everything works
fine. The collection I'm loading has the following statistics:
Number of documenents: 412
Average terms per document: 233
Maximum terms per document: 1557
Total terms in collection: 96290
In this particular case, the index gets corrupted when I flush every 23
documents, but is fine if I flush every 22 documents.
The same document collection loads fine using the standard 0.9.6 without
UTF-8 (using flint) without periodic flushing. I've also loaded other
collections with more documents but of a smaller size only flushing at the
end which has been fine.
Anyone know why this is happening and what to do about it?
Thanks.
--
John Wang
http://www.dev411.com/blog/
More information about the Xapian-discuss
mailing list