[Xapian-discuss] Best Practices for Compaction?

Kenneth Loafman kenneth at loafman.com
Sat Sep 19 19:30:30 BST 2009


Thanks for the pointer on ActiveMQ, could be a much better fit.

...Ken

Arjen van der Meijden wrote:
> It obviously depends on your source of data. Ours is a webforum where
> all topics and postings are stored in a database any way. So not
> bothering about the data untill the next cron-interval occurs is
> perfectly ok for us.
> 
> If you get some stream of data which you need to catch and then process,
> you may indeed want to queue it up in some form of storage. We've good
> experience with the ActiveMQ message queueing system, but using a normal
> database should work as well.
> 
> In the case of ActiveMQ you could just pause (or disconnect) your
> consumer for a while when doing the compaction while the producers
> continu to offer data to ActiveMQ.
> In a database you'll have to roll your own queuing or similar system,
> but it may be easier to use especially in case of batch processing and
> such.
> 
> Best regards,
> 
> Arjen
> 
> On 19-9-2009 16:45 Kenneth Loafman wrote:
>> Richard Boulton wrote:
>>> 2009/9/18 Arjen van der Meijden <acmmailing at tweakers.net
>>> <mailto:acmmailing at tweakers.net>>
>>>
>>>     Hi Ken,
>>>
>>>     We're only updating the database in intervals, not continuously.
>>>     What we're doing is basically:
>>>     [symlink for the database is to compacted database]
>>>     update "working" database
>>>     change symlink for database to "working"
>>>     compact working to a new compact database
>>>     change symlink for database back to compact
>>>
>>>
>>> I recommend using a stub-database file instead of a symlink - that way,
>>> if a reader has opened some of the database files but not others when
>>> the symlink changes, you don't get an inconsistent set of database files
>>> being opened.
>>>
>>> There's a variety of swapping schemes like this: I've used various
>>> different schemes, depending on what requirements for updating speed and
>>> search speed I'm trying to satisfy.
>>
>> There's still the issue of a two hour downtime, if I understand things
>> correctly.  During the compaction the source is locked and the target is
>> not usable, so collection has to stop, or be queued through another
>> mechanism, correct?
>>
>> I'm only updating the database in intervals, but I have to collect it
>> within a one-hour interval of when it was produced, or it goes away.
>> Two hours of downtime would mean at least one hour of lost activity
>> unless I'm misunderstanding the whole link/swap process.
>>
>> I'm thinking of using MySQL as a frontend so I can get 24/7 collection,
>> but wanted to avoid the staging complexity if possible.  Having query
>> access to the database unavailable for two hours is OK, but losing data
>> collection is not.
>>
>> ...Thanks,
>> ...Ken
>>
> 



More information about the Xapian-discuss mailing list