[Xapian-discuss] reindexing

Richard Boulton richard at tartarus.org
Mon Oct 24 21:46:46 BST 2011


On 24 October 2011 20:15, Sym Roe <sym.roe at talusdesign.co.uk> wrote:
> Could one run 'tree' (or similar) just before each run, and then use
> diff to look for deleted/moved/added files since the previous index
> run?

Could do, and that'd be a reasonable hack, though I think parsing the
diff output safely might be a bit of a pain.  I think it's probably
better to copy the approach omindex takes (or, indeed, to copy its
code).  If I've remembered this correctly, omindex keeps a bit vector
(ie, std::vector<bool> in C++), in which each bit represents the
Xapian document ID of a document; initially set to false for all
documents in the database.  When walking the filesystem to do the
update, it sets each bit if the document with the corresponding ID
exists.  After it finishes walking the filesystem, omindex then
iterates through the documents in the database, checking the bit for
each one, and deleting any documents from the database for which the
bit isn't set.

Actually, there are several subtleties and wrinkles here - read
through the omindex code for some relevant code comments; as a tie in
with another thread on the Xapian mailing lists at the moment - it
might be interesting to work out what kind of library API would
encapsulate the filesystem walking here most cleanly, and to pull the
code from omindex which handles this reindexing out.

-- 
Richard



More information about the Xapian-discuss mailing list