Patch to make rsync preserve access times

Wed Sep 5 02:53:33 EST 2001

According to Martin Pool:
> First of all, I agree that this would be a good feature.  It has been
> discussed before and for various reasons never merged, but we can
> certainly look again.

I'd certainly appreciate that.

> In fact, there are two related features
> 
>  (a) Don't update the atime on source files as they're read, which is
>      what people normally ask for.
> 
>  (b) Propagate atimes from source to destination.
> 
> This patch seems to implement (b) but not (a).  

No, the patch would appear to implement both (b) and (a) together in one
option.  A considerable amount of the patched code is there to reset access
times on source files at the many places where they are read.  (That,
however, may not have been obvious from my message, since I only included
the portions of the patch that I was commenting on, none of which concerned
that particular code.)

Arguably, the code could be split up and controlled through two separate
options, corresponding to (a) and (b) above.  However, it could also be
argued that one without the other would be of questionable value.  (I.e.
the idea is to try to keep the access times in sync on both sides.)

> To some extent this is not so useful, since otherwise the atimes will
> be updated on the first run and it will only work the first time you
> replicate.  This goes against the goal of rsync to be idempotent on
> repeated transfers.

I agree with that, but the patch does keep things consistent.  I've only
done limited tests on it so far, but it does seem to do what it should.

> Leaving that aside, the big problem is that trying to preserve access
> times on Unix is inherently flaky.  There is no way to open a file
> without touching its atime.  You might see this as a security feature,
> though not a very good one.

I'm not sure that opening the file is sufficient to touch the atime, but
reading certainly does that.  So, yes, there are probably a lot of cases
that have to be considered, to make sure you're always preserving atime
after any read.

As for security issues, I don't think relying on atime is of much value
from that perspective, but atime is useful in other situations, e.g. on
mailboxes.

> The way GNU tar works around this, if I understand correctly, is to
> remember the original atime, and then go back and reset it after
> reading.

Yes, that's my understanding too, but I've never looked at the code.  Note
also that the cpio command has had a "-a" option to do the same sort of
thing going back to the early days of UNIX and BSD.

> But there's a bad race here: some other program might also
> be accessing it, and so the time ought to be updated.  And of course
> this also touches the ctime, which is perhaps not a good tradeoff.

These may be problems in theory, but in my experience, they're not really
problems in practice.  Despite the possibility of a race condition, I think
this capability would be useful in rsync, just as it has been in GNU tar and
in cpio for many years.  In fact, it could potentially be more useful in
rsync, IMHO.

Some people may be concerned about the mods to ctime, which can hardly be
avoided, but I don't see that being a problem in practice.  I'm already used
to ctime being changed at the drop of a hat, and don't find them reliable
for much.  :)  In any case, ctime will always be updated on the destination
side, which makes it virtually impossible to synchronize those in any case.
(So, this is not likely to be a problem in any sort of practical application
of rsync.)

> I don't think there's any good solution possible in userspace.

Well, there's not a perfect solution possible in userspace.  (You would
likely have to do like the dump/ufsdump command, and run through the file
system yourself, bypassing the OS, in order to be completely stealthy on the
source side.)  However, I don't think a perfect solution is needed in order
for it to be considered a "good" solution, or at least a useful one.

> Arguably Unix/Linux ought to add an open mode O_NOATIME, just as there
> is a mount option.  If Linux had that we could certainly use it.
> There are still some tricky cases.

That may be one way to do it, but it wouldn't port well to other UNIX and
UNIX-like systems.  (I'm particularly interested in Solaris support, as well
as Linux.)  So, a more general solution is called for, IMHO.  Of course, if
the feature were to exist in the OS, it would be a good thing to use, and
then you could just fall back to the other way of doing it on systems where
you have no better method.

> At the moment I think unless there is a reliable and consistent way to
> do this we should hold off.

I'm sorry to hear that.  Please have another look at the original patch,
and see if it does indeed (as it appears to) handle the touching back of
atime on all source files that are read, and does so in a reasoble way.
If some cases are currently missed, then it would be worth dealing with
them.  However, I don't think we need a perfect solution in order for it
to be considered reliable, consistent, and useful.

I find the patch (with my corrections) to be important enough that I
certainly intend to use it on my systems.  However, I am concerned about
having to continue to patch the patch :) in order for it to work with future
rsync versions, and having a non-standard extension to the protocol which
could in future create incompatibilities with other (non-patched) versions
of rsync.  This is why I'd really like this feature to be seriously
considered for inclusion, so that it can be implemented right.

Thanks for your time, for your reply, and for your consideration.

-- 
Gilbert E. Detillieux		E-mail:	<gedetil at cs.umanitoba.ca>
Dept. of Computer Science	Web:	http://www.cs.umanitoba.ca/~gedetil/
University of Manitoba		Phone:	(204)474-8161
Winnipeg, MB, CANADA  R3T 2N2	Fax:	(204)474-7609
"Cautionary tales don't end with 'It was SO COOL!'" - Malcolm in the Middle