[Xapian-discuss] Flint Backend

Thu Jun 23 15:13:23 BST 2005

Olly Betts wrote:
> On Thu, Jun 23, 2005 at 08:25:31AM +0200, Arjen van der Meijden wrote:
> 
>>The quartzcompact doesn't do that much for the position-table it goes 
>>from 7.8GB (a db that is in use for quite some time now) to 7.0GB, which 
>>is about 11% (actually more than I thought I'd know).
>>Of course I can't tell which is overhead generated due to it being in 
>>long use and what is actual compaction-gain.
> 
> You can use "quartzcompact -n" to compact but not do tag splitting to
> fill blocks fuller (and "quartzcompact -F" to generate larger than
> normal tag chunks and reduce size further, but the I'd not recommend
> using this if you plan to update the compacted database again).
> 
> The difference between "quartzcompact -n" and "quartzcompact" (or the
> extra gain from running "quartzcompact" after "quartzcompact -n") is
> probably what you're thinking of as the "actual compaction-gain".

We don't update the compacted database, if that should happen it wouldd 
be an emergency situation in which case we'd problably just rebuild the 
entire index from scratch.
Will the -n and -F work for other tables than position as well?

>>Will this give useable figures if I'd use the current flint-backend, or 
>>are the bugs you found such that especially the size of the index is 
>>negatively influenced?
> 
> 
> With 0.9.1, you can't open a flint index for reading.  Also the
> positionlist packing missed out some information necessary to actually
> unpack the list again, so the size will be slightly underestimated if
> anything.
> 
> If you want to try flint, it's probably best to use a snapshot from SVN.
> This also has the new "xapian-compact" which is like quartzcompact but
> for flint databases.

I've installed a SVN-snapshot from this afternoon and I'm downloading a 
today's (compacted) index.

> I've now written "flintcompact" (but called it "xapian-compact" with
> an eye to the future!)

Great, I'll test it tomorrow as well on our database then.

>>We have about 1M documents indeed, but that takes up much more than the 
>>4GB of memory the production machine has I guess. You can see above what 
>>size our position-table is. Development-machines here 'only' have 1GB.
> 
> You probably don't want to use XAPIAN_FLUSH_THRESHOLD=1000000 then,
> especially as your documents are large.  Hopefully I can make this
> parameter self-tuning (and also greatly reduce the space needed for
> buffering).

The advantage of the ability to specify such a variable yourself is that 
you can depend on it. In our case we keep a counter which document was 
last indexed/updated (and its last update time). But it's not that handy 
to do that if you can't predict how much documents scriptindex will 
actually process. For (quartz|xapian)compact it doesn't matter though, 
that needs to finish or its work is kinda useless.

Best regards,

Arjen