[Xapian-discuss] last_mod performance

Olly Betts olly at survex.com
Fri Feb 20 06:07:52 GMT 2009


On Thu, Feb 12, 2009 at 02:04:01AM +1030, Frank J Bruzzaniti wrote:
> I found:
> 
> http://trac.xapian.org/attachment/ticket/282/omindex-assorted-enhancements.patch
> 
> Is the implementation of last_mod to sip unchanged files in this patch 
> good to use?

It looks plausible.  I've not tested it, but presumably Reini has.

The useful parts of the monster patch really need splitting out, tidying
up, testing, and documenting - then we can commit them.  I've not had
time myself beyond updating it to SVN trunk and opening that ticket.

Checking last_mod does add some overhead (we need to look it up in the
database for every document) so if most documents have changed, it will
probably slow things down.  The break-even point is likely to be sooner
when indexing documents which require external filters to be run.  It
would be interesting to see where it is for just HTML.

On trunk we could use the value upper bound as a cheaper check which
would help a lot.  If a file is newer than the newest file in the index
when we started then it definitely needs reindexing, and updated
files will usually be newer than the most recent index run.  This check
would also nicely handle the case of indexing starting from an empty
database.

Cheers,
    Olly



More information about the Xapian-discuss mailing list