[Xapian-discuss] Ticket #342: Omega: Add option to avoid reindexing unchanged files

Srijon Biswas srijon.biswas at googlemail.com
Thu May 21 08:48:19 BST 2009


Hmmm...

I am not sure if that is a very good solution: it leaves the door open to
different behaviour based on conditions that the user may not be fully aware
of, if you get what I mean (moved some files around, some got updated in the
db, some did not... why?). In fact, after reading the previous mail from
Olly, while I understand the rationale for using the last-modified time, I
am still not fully comfortable with it from an implementation standpoint.
This is probably because I do not use the application the same way as he has
mentioned  :) - so I might be making an issue out of nothing.

How about something along the lines of providing the user the option to
select the way he wants to go: --strict-check or something that would check
the md5sums, else go with the last-modified time, and mention in the help
documentation where --strict-check might become useful (ie: pretty much only
if you have been moving files around as opposed to modifying/editing them).

Just a suggestion :) - I think people who use the app frequently/extensively
will be in a better position to comment.

If this seems reasonable, I'll try putting this in, in the python app (in
the next iteration). I'll try to push out the first version today or so.

Thanks,
Srijon.


On Wed, May 20, 2009 at 8:16 PM, Richard Boulton <
richard at lemurconsulting.com> wrote:

> 2009/5/20 Olly Betts <olly at survex.com>:
> >> Maybe the test for changed content should depend on the md5sum and not
> on
> >> the date (even though this does add more burden than just checking the
> last
> >> mod date). Something roughly like this:
> >
> > Yes, it's quite a lot more work, but it would save some work.  A fuller
> > solution to ticket #250 would reduce the gain here, but there would
> > probably still be some:
>
> Checking if the file size has changed as well as the date is another
> approach - it doesn't cause all changes to be noticed, of course, but
> it's a lot cheaper than computing the MD5 sum of the file (if you've
> done a stat(), you've already got the size available).
>
> --
> Richard
>


More information about the Xapian-discuss mailing list