[Xapian-discuss] Flint Backend

Thu Jun 23 07:25:31 BST 2005

On 11-6-2005 15:19, Olly Betts wrote:
> On Sat, Jun 11, 2005 at 11:53:05AM +0200, Arjen van der Meijden wrote:
> 
> I've only run one example through it so far, which was artificial data.
> Also I don't have a "flintcompact" yet.  So it's not totally easy to
> compare but the uncompacted flint position table was about 15% smaller
> than the compacted quartz one (if I remember correctly).  However flint
> does a better job of being compact to start with.

The quartzcompact doesn't do that much for the position-table it goes 
from 7.8GB (a db that is in use for quite some time now) to 7.0GB, which 
is about 11% (actually more than I thought I'd know).
Of course I can't tell which is overhead generated due to it being in 
long use and what is actual compaction-gain. The position-table is not 
zlib-compressed.

> I'm certainly interested to hear results of converting real-world
> databases to flint (especially on positionlist table size).  You can
> do this like so (assuming sh, bash, zsh or similar):
> 
> XAPIAN_PREFER_FLINT=1 XAPIAN_FLUSH_THRESHOLD=1000000 copydatabase <qdir> <fdir>
> 
> Where <qdir> is the existing Quartz database and <fdir> is the directory
> to create the flint database in.

Will this give useable figures if I'd use the current flint-backend, or 
are the bugs you found such that especially the size of the index is 
negatively influenced?

> Reduce 1000000 if you don't have loads of memory.  If this number is
> more than the number of documents, you'll get something roughly
> equivalent to what "flintcompact -n" would give, if flintcompact
> existed!

With 1000000 it'd try and store _all_ data for the 1M documents in 
memory before actually flushing them to disk?
We have about 1M documents indeed, but that takes up much more than the 
4GB of memory the production machine has I guess. You can see above what 
size our position-table is. Development-machines here 'only' have 1GB.

> But beware that copydatabase is inherently a lot slower than
> quartzcompact because copydatabase reinverts the data whereas quartzcompact
> copies the already inverted data.

Quartzcompact takes about 2:30 hours on the production machine, so I 
can't risk doing such a slow job on the production machine. A 
development machine can do though.

Best regards,

Arjen