[Xapian-discuss] last_mod performance

Frank J Bruzzaniti frank.bruzzaniti at gmail.com
Fri Feb 20 06:27:37 GMT 2009


I found that last mod from that patch ends up deleting documents.
But I found another patch on trac that worked, I've been testing it and 
have found indexing time for me went down from 4 hours to about 8 minutes.

http://trac.xapian.org/ticket/285

I use a slightly modified version in here.

http://trac.xapian.org/attachment/ticket/290/office2007.patch



Olly Betts wrote:
> On Thu, Feb 12, 2009 at 02:04:01AM +1030, Frank J Bruzzaniti wrote:
>   
>> I found:
>>
>> http://trac.xapian.org/attachment/ticket/282/omindex-assorted-enhancements.patch
>>
>> Is the implementation of last_mod to sip unchanged files in this patch 
>> good to use?
>>     
>
> It looks plausible.  I've not tested it, but presumably Reini has.
>
> The useful parts of the monster patch really need splitting out, tidying
> up, testing, and documenting - then we can commit them.  I've not had
> time myself beyond updating it to SVN trunk and opening that ticket.
>
> Checking last_mod does add some overhead (we need to look it up in the
> database for every document) so if most documents have changed, it will
> probably slow things down.  The break-even point is likely to be sooner
> when indexing documents which require external filters to be run.  It
> would be interesting to see where it is for just HTML.
>
> On trunk we could use the value upper bound as a cheaper check which
> would help a lot.  If a file is newer than the newest file in the index
> when we started then it definitely needs reindexing, and updated
> files will usually be newer than the most recent index run.  This check
> would also nicely handle the case of indexing starting from an empty
> database.
>
> Cheers,
>     Olly
>   



More information about the Xapian-discuss mailing list