[Xapian-discuss] Using xapian for general indexed storage

Olly Betts olly at survex.com
Sun Nov 15 12:36:55 GMT 2009


On Fri, Nov 13, 2009 at 03:08:56PM +0100, Jean-Francois Dockes wrote:
> Two questions about using Xapian as a gdbm stand-in for an auxiliary
> database:
> 
>  - I am currently using single-term documents having the key as a single
>    term, and the (small) associated data chunk stored in the document data
>    record. Is this still the right way to do it?

If you're just wanting a key/value store, it would be a bit more efficient
to just store them as user metadata (a you wouldn't have the
termname->docid translation stage):

http://xapian.org/docs/apidoc/html/classXapian_1_1WritableDatabase.html#4a8d53e528bda6cee8e507b95f5c0b31

But note that currently Xapian tries to compress document data with zlib,
but doesn't try to compress user metadata.  This may change in the future
- I don't think it was a deliberate decision, just due to where the user
metadata is stored.

>  - There was an answer on the mailing list two years ago, saying that
>    storing a few megabytes in the document data records was ok. Does
>    this still hold, or would it be preferable to use file-system storage ?
>    There is no question of peeking inside the data, it's opaque.
> 
> http://article.gmane.org/gmane.comp.search.xapian.general/4730/match=document+data+record+storing

Just to be clear, when Richard says "2MB is fine", he means "2MB is well
within the upper limit" rather than that he particularly recommends doing
it, since he goes on to say:

    It's probably a mistake to try storing that much data, anyway; while it 
    should work, you'll end up with a single very large file in the Xapian 
    database directory holding the records, which might be a pain when 
    taking backups, etc.  Also, Xapian doesn't provide you with any ability 
    to perform randomly access on the document data - you have to read it 
    all into memory to access it: if the data was stored in a file, the 
    operating system can access it much more efficiently.

The same applies to user metadata.

The advantage of using Xapian is that you can get changes committed
atomically along with other changes to the database.  And it does avoid
having to open a file for each item you want to read.

Cheers,
    Olly



More information about the Xapian-discuss mailing list