errors on rebuild
Ryan Cross
rcross at amsl.com
Fri Apr 7 18:17:34 BST 2017
Thanks for the information on the differences between chert and glass.
This explains the performance / index size changes I’m seeing. For the
time being chert 1.4.3 is working and I’ll keep my eye out for new releases.
Thanks,
Ryan
> On Apr 2, 2017, at 6:29 PM, Olly Betts <olly at survex.com> wrote:
>
> On Sat, Mar 25, 2017 at 06:36:25PM -0500, Ryan Cross wrote:
>> After upgrades my stack is now:
>>
>> Python 2.7
>> Django 1.8
>> Haystack 2.6.0
>> Xapian 1.4.3. (latest xapian haystack backend with some modifications)
>>
>> Using the same rebuild command as below but with —batch-size=50000
>>
>> The issue has now become one of performance. I am indexing 2.2 million
>> documents. Using delve I can see that performance starts off at about
>> 100,000 records an hour. This is consistent with the roughly 24 hour
>> rebuild time I was experiencing with Xapian 1.2.21 (chert). However,
>> after 75 hours of build time, the index is about 75% complete and
>> records are processing at a rate of 10,000/hr. The index is 51GB is
>> size, 30GB is position.glass.
>
> One of the big differences between chert and glass is that glass stores
> positional data in a different order such that phrase searches are much
> more I/O efficient. The downside is that this means extra work at index
> time, and more data to batch up in memory. There's a thread discussing
> this here:
>
> https://lists.xapian.org/pipermail/xapian-discuss/2016-April/009368.html
>
>> Here is a one minute strace summary
>>
>> % time seconds usecs/call calls errors syscall
>> ------ ----------- ----------- --------- --------- ----------------
>> 63.97 1.272902 13 100240 pread
>> 33.71 0.670733 14 48175 pwrite
>
> A one minute sample is hard to extrapolate from, as the indexing process
> currently goes through phases of flushing changes, so whichever phase the
> one minute is from isn't going to be representative.
>
> But from the information you give, my guess is that the extra memory
> used for batching up changes is pushing you over an I/O cliff, and
> you would get better throughput by reducing the batch size (assuming
> the "batch size" you specify maps to XAPIAN_FLUSH_THRESHOLD or something
> equivalent). Especially likely if you tuned that batch size for chert.
>
> There are some longer term plans to rework the batching and flush process
> which should improve matters a lot (and hopefully remove the need for
> manually tweaking such settings). I'm hoping that will land in the
> next release series, so you could consider sticking with chert for 1.4.x,
> assuming the problematic phrase search cases aren't an issue for you.
> There are various other improvements between chert and glass (better
> tracking of free space, less on-disk overhead) which you'd lose out on
> though.
>
> Cheers,
> Olly
More information about the Xapian-discuss
mailing list