[Xapian-discuss] How to index a lot of documents quickly

Olly Betts olly at survex.com
Tue Mar 8 04:21:42 GMT 2005


On Thu, Mar 03, 2005 at 01:51:47PM +0000, Olly Betts wrote:
> On Thu, Mar 03, 2005 at 12:05:15AM +0000, Olly Betts wrote:
> > At present, quartzcompact doesn't produce quite the same output from
> > merging as it would when compacting a single file.  The issue is that
> > the keys in 3 tables don't exactly sort in docid order, so the merging
> > used doesn't write the keys in totally sorted order.  I'm just testing
> > to see if this adversely affects the database size.  If it does, I can
> > fix it, at the potential cost of a slightly slower merge.
> 
> I'm currently running the output of the gmane merge through
> quartzcompact.  It's reduced the size of the record table by 43% (!), so
> I think I need to address this (the postlist table is unsuprisingly
> unchanged, and the value table isn't used so that will be too - it's
> still working on the other 2).

The other two tables were reduced in size by 40% (termlist) and 26%
(position).  The total size reduction was 28%. 

I've fixed quartzcompact to write the keys in sorted order for all tables
when merging.  You can get the latest source from CVS and drop it into
the bin subdirectory of xapian-core 0.8.5:

http://cvs.xapian.org/*checkout*/xapian/xapian-core/bin/quartzcompact.cc

Cheers,
    Olly



More information about the Xapian-discuss mailing list