[PATCH] Consider nanoseconds when quick-checking for unchanged files

Fri Jan 22 17:11:25 UTC 2016

On Wed, 20 Jan 2016 23:04:20 -0800
Wayne Davison <wayned at samba.org> wrote:
> 
> The problem is that if you transfer from a filesystem that has
> nanoseconds to one that does not support it, rsync would consider
> most of the files to be constantly different, since the nanosecond
> values would only match if the source file happened to have 0
> nanoseconds. So, the logic has to be improved to somehow detect such
> a case and treat the truncated values as equal. One possible
> improvement would be to skip the nanosecond check if the destination
> file has a nanosecond value of 0.  That could possibly be improved if
> we figure out if a particular device ID supports nanoseconds
> somehow.

Seems to me that nanoseconds are the sort of thing that could cause
sysadms crazy headaches.   My thought is to have a declaration
in the rsync configuration file (that can be overridden on
the command line).    Something like "--nanosecond".
It'd have the following values:

ignore   : Ignore nanoseconds.  (default)

update   : Ignore nanoseconds, but update destination timestamps
           when nanoseconds differ.

heuristic: Check nanoseconds with Wayne's spiffy heuristic.

check    : Check nanoseconds.

When there is a conflict between the conf files of the 2 endpoints the
topmost of the above options has priority.  (When no configuration
is specified on at least one endpoint there is no conflict.)

To provide control over conflict management you could have another
option, say, --nanosecond-force, to force your endpoint's choice.  If
both ends force then the later of ignore, update, heuristic, check has
priority.

I don't know how this would work with the existing rsync protocol.
Perhaps it'd be easier to have only the destination end's config
matter, although this does not provide a lot of flexibility from the
command line.  The motivation is to be able to keep things simple,
or as simple as they can be.  Already my ideas seem overly complicated.
Perhaps someone can improve them.

It makes some sense to be able to configure nanosecond related
behavior on a per-directory (i.e., mountpoint) basis, as a substitute
for knowing about every possible filesystem type and being able to
detect file system type.  But this introduces yet more complication.

The default is backward compatible.  Distros could set their
own default in the rsync.conf file they install.

---

Maybe the thing to do is to give up on runtime complication and
just do testing on the destination filesystem when rsync initializes.
This would be on by default but could be turned off by command line.

Since the problem is with destination filesystems that don't support
nanoseconds, and destination filesystems are by definition written to,
then test for nanosecond support once when rsync starts.  Write a file
with non-zero nanoseconds and read it back and see if the nanoseconds
are zero.  Then delete the test file.  Easy if the destination is
empty, harder if there's an existing directory hierarchy.  (But rsync
can already tell if it's crossing a filesystem boundary so....)

Trouble is that unconditionally writing a file would affect directory
timestamps.  If you wait until you know you're writing to a directory
then the approach here is starting to sound suspiciously like Wayne's
heuristic....

Hope these thoughts are helpful to somebody.

Regards,

Karl <kop at meme.com>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein