Storing the documents text: data record or value ?

Jean-Francois Dockes jf at dockes.org
Wed Jan 3 15:18:18 GMT 2018


Hi,

Following the Recoll snippets generation performance problem caused by the
new positions list storage scheme in Xapian 1.4, I am experimenting with
generating snippets from the complete document text stored in the index.

This increases the index size much less than I would have expected (around
10-15% apparently with my home directory data), which is good news
obviously.

I have tried storing the text in the data record, or in a value (after
compressing it). Storing in a value uses a tiny bit more space, I am
guessing because of the co-compression of related data occuring when
storing in the data record.

Seen from the outside, it would appear to make sense to use values, so that
code which needs to access the data record but not the full document text
does not pay a performance penalty.

I am wondering if there are other arguments for using either method ?

Cheers,

jf



More information about the Xapian-discuss mailing list