[Xapian-discuss] last_mod performance
Frank J Bruzzaniti
frank.bruzzaniti at gmail.com
Fri Feb 20 06:27:37 GMT 2009
I found that last mod from that patch ends up deleting documents.
But I found another patch on trac that worked, I've been testing it and
have found indexing time for me went down from 4 hours to about 8 minutes.
http://trac.xapian.org/ticket/285
I use a slightly modified version in here.
http://trac.xapian.org/attachment/ticket/290/office2007.patch
Olly Betts wrote:
> On Thu, Feb 12, 2009 at 02:04:01AM +1030, Frank J Bruzzaniti wrote:
>
>> I found:
>>
>> http://trac.xapian.org/attachment/ticket/282/omindex-assorted-enhancements.patch
>>
>> Is the implementation of last_mod to sip unchanged files in this patch
>> good to use?
>>
>
> It looks plausible. I've not tested it, but presumably Reini has.
>
> The useful parts of the monster patch really need splitting out, tidying
> up, testing, and documenting - then we can commit them. I've not had
> time myself beyond updating it to SVN trunk and opening that ticket.
>
> Checking last_mod does add some overhead (we need to look it up in the
> database for every document) so if most documents have changed, it will
> probably slow things down. The break-even point is likely to be sooner
> when indexing documents which require external filters to be run. It
> would be interesting to see where it is for just HTML.
>
> On trunk we could use the value upper bound as a cheaper check which
> would help a lot. If a file is newer than the newest file in the index
> when we started then it definitely needs reindexing, and updated
> files will usually be newer than the most recent index run. This check
> would also nicely handle the case of indexing starting from an empty
> database.
>
> Cheers,
> Olly
>
More information about the Xapian-discuss
mailing list