[Xapian-discuss] Flint Backend

Olly Betts olly at survex.com
Thu Jun 23 17:22:57 BST 2005


On Thu, Jun 23, 2005 at 04:13:23PM +0200, Arjen van der Meijden wrote:
> Olly Betts wrote:
> >You can use "quartzcompact -n" to compact but not do tag splitting to
> >fill blocks fuller (and "quartzcompact -F" to generate larger than
> >normal tag chunks and reduce size further, but the I'd not recommend
> >using this if you plan to update the compacted database again).
> 
> We don't update the compacted database, if that should happen it wouldd 
> be an emergency situation in which case we'd problably just rebuild the 
> entire index from scratch.

You might as well use "-F" (aka "--fuller") then.

> Will the -n and -F work for other tables than position as well?

Yes, they work at a very low level.

> >You probably don't want to use XAPIAN_FLUSH_THRESHOLD=1000000 then,
> >especially as your documents are large.  Hopefully I can make this
> >parameter self-tuning (and also greatly reduce the space needed for
> >buffering).
> 
> The advantage of the ability to specify such a variable yourself is that 
> you can depend on it. In our case we keep a counter which document was 
> last indexed/updated (and its last update time). But it's not that handy 
> to do that if you can't predict how much documents scriptindex will 
> actually process.

True, it's probably useful to be able to set it.  But I think most people
would prefer a decent heuristic to take care of it.

My thinking is that we separate the buffered postlist entries into
"modifications" and "appends".  When building from scratch, you'll
always generate "appends", and we can actually store these in a nicely
compact form because we only need to be able to append to the growing
list for each term.

> For (quartz|xapian)compact it doesn't matter though,
> that needs to finish or its work is kinda useless.

Also, they both work on the already inverted postlist table, so they
don't need to buffer at all.  So their memory use should be fairly
modest.

Cheers,
    Olly



More information about the Xapian-discuss mailing list