[Xapian-discuss] reindexing
Liam
xapian at networkimprov.net
Tue Oct 25 02:45:59 BST 2011
On Mon, Oct 24, 2011 at 1:46 PM, Richard Boulton <richard at tartarus.org>wrote:
> On 24 October 2011 20:15, Sym Roe <sym.roe at talusdesign.co.uk> wrote:
> > Could one run 'tree' (or similar) just before each run, and then use
> > diff to look for deleted/moved/added files since the previous index
> > run?
>
> Could do, and that'd be a reasonable hack, though I think parsing the
> diff output safely might be a bit of a pain. I think it's probably
> better to copy the approach omindex takes (or, indeed, to copy its
> code). If I've remembered this correctly, omindex keeps a bit vector
> (ie, std::vector<bool> in C++), in which each bit represents the
> Xapian document ID of a document; initially set to false for all
> documents in the database. When walking the filesystem to do the
> update, it sets each bit if the document with the corresponding ID
> exists. After it finishes walking the filesystem, omindex then
> iterates through the documents in the database, checking the bit for
> each one, and deleting any documents from the database for which the
> bit isn't set.
>
> Actually, there are several subtleties and wrinkles here - read
> through the omindex code for some relevant code comments; as a tie in
> with another thread on the Xapian mailing lists at the moment - it
> might be interesting to work out what kind of library API would
> encapsulate the filesystem walking here most cleanly, and to pull the
> code from omindex which handles this reindexing out.
>
A tree walker module would invoke the mime-converter and index updater
modules for each file, then do the delete pass itself. Perhaps:
void index_tree(std:string& rootpath, WritableDatabase& db, IndexOptions&
opts, std::ostream& report)
// writes results to report
More information about the Xapian-discuss
mailing list