[Xapian-discuss] Question about embedding Xapian index in one file
olly at survex.com
Sun Dec 1 21:57:49 GMT 2013
On Wed, Nov 27, 2013 at 09:59:29PM +0100, Emmanuel Engelhart wrote:
> Le 27/11/2013 21:49, Wilm Schumacher a écrit :
> > 1.) a possible solution: I would not do this by xapian, but by the
> > operating system. You could create a "moutable file", and mount it to a
> > directory of your choice and let Xapian read and write in this
> > directory. All changes would happen in this one file.
> This is not really on option for us, our software supports a lot of
> different OSes, and I don't really want to start to add such type of
A more portable "external" solution would be an LD_PRELOAD library which
translates I/O library calls to allow a tar file to look like a
directory to Xapian. Or some other suitable format - e.g. 'ar' format
may be a simpler alternative to tar (you need to be able to efficiently
seek to particular offsets in the .DB files, so I'd recommend not using
a compressed container). Also, I'd recommend padding to align the .DB
files with the FS block size - probably aligning with the Xapian block
size makes sense.
A neat way to implement this inside Xapian would be to renumber the
blocks in the tables so they use disjoint ranges, then concatenate the
.DB files into a single file. You'd need to store the information you
need from the base files somewhere, perhaps in a fake first or last
block - you don't need the bitmaps for reading, only the data in the
base file headers.
The neat thing is that once you've opened the file, all the offsets will
just be correct, so you just need to adjust the opening and closing
This approach could also feasibly be extended to support writing - it
would just need a way to get the next free block appropriately, which
isn't much different to doing the same for a normal table. Handling the
bitmap in the same file would be awkward, but I'm working on a patch
to use freelists instead of bitmaps, which would work nicely with this.
More information about the Xapian-discuss