errors on rebuild

Ryan Cross rcross at amsl.com
Sat Mar 25 23:36:25 GMT 2017


Hi Olly,

After upgrades my stack is now:

Python 2.7
Django 1.8
Haystack 2.6.0
Xapian 1.4.3. (latest xapian haystack backend with some modifications)

Using the same rebuild command as below but with —batch-size=50000

The issue has now become one of performance.  I am indexing 2.2 million documents.  Using delve I can see that performance starts off at about 100,000 records an hour.  This is consistent with the roughly 24 hour rebuild time I was experiencing with Xapian 1.2.21 (chert).  However, after 75 hours of build time, the index is about 75% complete and records are processing at a rate of 10,000/hr.  The index is 51GB is size, 30GB is position.glass.  

Here is a one minute strace summary

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 63.97    1.272902          13    100240           pread
 33.71    0.670733          14     48175           pwrite
  0.57    0.011253           8      1484           read
  0.45    0.008938           6      1524           fstat
  0.36    0.007098           6      1270           lseek
  0.25    0.004988          20       254           open
  0.18    0.003544          14       254           recvfrom
  0.11    0.002148           8       254           sendto
  0.10    0.002056           8       254           close
  0.10    0.001949           8       254           poll
  0.07    0.001429          11       127           munmap
  0.06    0.001111           9       127           mmap
  0.04    0.000802           6       127       127 ioctl
  0.04    0.000773           6       127           gettimeofday
------ ----------- ----------- --------- --------- ----------------
100.00    1.989724                154471       127 total

This is ten documents with number of terms in the 10s - low100s range.  Is there a way I can tune for better performance?

Thanks,
Ryan


> On Mar 2, 2017, at 4:48 PM, Ryan Cross <rcross at amsl.com> wrote:
> 
> Hi Olly,
> 
> Thanks for the detailed response.  I hadn’t realized there was a new xapian haystack backend.  I’m going to try that but I have some upgrades to do first.  Django 1.8, etc.
> 
> Thanks,
> Ryan
> 
>> On Feb 28, 2017, at 3:40 PM, Olly Betts <olly at survex.com> wrote:
>> 
>> On Mon, Feb 27, 2017 at 10:29:46AM -0800, Ryan Cross wrote:
>>> I am trying to rebuild an index of 2+ million documents and have not been successful.  I am running 
>>> 
>>> Python 2.7
>>> Django 1.7
>>> Haystack 2.1.1
>>> Xapian 1.2.21
>>> 
>>> The index rebuild command I’m using is: django-admin.py rebuild_index --noinput --batch-size=100000
>>> The rebuild completes but an immediate xapian-check returns this error:
>> [...]
>>> Trying the latest stable version, Xapian 1.4.3, it fails during the rebuild:
>>> 
>>> All documents removed.
>>> Indexing 2233651 messages
>>> Traceback (most recent call last):
>>>>>> 
>>> File "/a/mailarch/current/haystack/management/commands/update_index.py", line 221, in handle_label
>>>   self.update_backend(label, using)
>>> File "/a/mailarch/current/haystack/management/commands/update_index.py", line 266, in update_backend
>>>   do_update(backend, index, qs, start, end, total, self.verbosity)
>>> File "/a/mailarch/current/haystack/management/commands/update_index.py", line 89, in do_update
>>>   backend.update(index, current_qs)
>>> File "/a/mailarch/current/haystack/backends/xapian_backend.py", line 286, in update
>>>   database.close()
>> 
>> What's the version of xapian-haystack?  There's not a database.close() anywhere
>> near line 286 in git master:
>> 
>> https://github.com/notanumber/xapian-haystack/blob/master/xapian_backend.py#L286
>> 
>>> xapian.DatabaseCorruptError: Expected block 615203 to be level 0, not 1
>>> docdata:
>>> blocksize=8K items=380000 firstunused=21983 revision=38 levels=2 root=21410
>> 
>> Is that the full output of xapian-check?
>> 
>>> Any suggestions for how I could get more information to troubleshoot this
>>> failure would be greatly appreciated.
>> 
>> Is the data to reproduce this something you can make available?
>> 
>> I'd stick with Xapian 1.4.3 for trying to narrow this down (if it's a Xapian
>> bug we can backport the fix once identified).
>> 
>> The error message means that a block which was expected to be at the leaf level
>> was actually marked as being one level above, which suggests either there's an
>> obscure bug in the backend code which only manifests in rare circumstances, or
>> something is corrupting data (could be in memory or on disk).
>> 
>> Since this happens with both 1.2.x and 1.4.x I would tend to suspect it's
>> something external (rather than a bug in Xapian) as the default backends in 1.2
>> and 1.4 have some significant differences.  It's certainly possible it's a
>> Xapian bug, but if so I would expect we'd be seeing other reports, though maybe
>> we've actually had one or two and thought them due to #675, which was fixed in
>> 1.2.21 (however nobody's yet said "no, still seeing that"):
>> 
>> https://trac.xapian.org/ticket/675
>> 
>> You could look at block 615203 of docdata.glass to see what it looks like -
>> that might offer clues:
>> 
>> xxd -g1 -seek $((615203*8192)) -len 8192 docdata.glass
>> 
>> It'd also be good to eliminate possible system issues - e.g. check the disk is
>> healthy (check the SMART status, run fsck on it), run a RAM test (distros often
>> provide a way to run memtest86+ or similar from the boot menu).
>> 
>> Cheers,
>>   Olly
> 



More information about the Xapian-discuss mailing list