Can't handle insanely large tags

Olly Betts olly at survex.com
Thu Mar 13 21:16:53 GMT 2025


On Wed, Mar 12, 2025 at 10:01:50PM +0100, Jean-Francois Dockes wrote:
> 
> Thanks for the fast answer ! I've certainly no plan to store such big objects in
> Xapian. It just means that there is a missing sanity check somewhere.
> 
> The user succeeded in pinpointing the problem to a 900  MBytes mbox file.
> 
> A possible reason would be that a really bad mbox would be misparsed, producing
> e.g. an enormous Subject: or From: field which would get as an attribute into the data
> record. I see that I have no size checks on this at the moment. I'll investigate in this
> direction.
> 
> Can this come from anything other than the data record ?

Probably - the document data is the simplest to reason about (because
it gets compressed with zlib and we have a reasonably idea how well
zlib will compress typical data).

Postlists are chunked at a higher level to support efficient
skipping forwards so postlist table entries shouldn't be more than about
2000 bytes, but I'd think it's probably possible for at least some other
tables.

Some other tables might be possible - for example, if you indexed a
document by enough distinct terms you'd probably end up with a termlist
entry that's too big to store, but the encoding used tends to become
more compact the more terms there are so it's hard to say at what point
this would happen without testing.

Cheers,
    Olly



More information about the Xapian-discuss mailing list