[PATCH] Consider nanoseconds when quick-checking for unchanged files

Fri Jan 22 07:01:29 UTC 2016

    > Date: Wed, 20 Jan 2016 23:04:20 -0800
    > From: Wayne Davison <wayned at samba.org>

    > On Thu, Dec 25, 2014 at 2:48 AM, Ingo Br=C3=BCckl <ib at wupperonline.de> wrote:

    > > On systems using nanoseconds differences should be taken into
    > > consideration.

    > The problem is that if you transfer from a filesystem that has nanoseconds
    > to one that does not support it, rsync would consider most of the files to
    > be constantly different, since the nanosecond values would only match if
    > the source file happened to have 0 nanoseconds. So, the logic has to be
    > improved to somehow detect such a case and treat the truncated values as
    > equal. One possible improvement would be to skip the nanosecond check if
    > the destination file has a nanosecond value of 0.  That could possibly be
    > improved if we figure out if a particular device ID supports nanoseconds
    > somehow.  I have a potential heuristic in mind that I can code up and see
    > how it works.

Here's one idea, and note an important issue with ns times and --link-dest:

(a) For each end, see if any of the files being considered already
have nonzero nanosecond parts.  If so, then that end of the transfer
supports nanosecond timing.

(b) If the sending filesystem appears to have nonzero ns parts, and
the receiving filesystem appears to have all-zero ns parts (including
any directories under consideration), the receiver may still support
ns times, but have been synchronized from a filesystem that didn't.
We don't want to perpetuate that on the -next- sync, however, so we
can't just disallow ns times on the receiver, or we'll never try them
again.

(c) In case (b) above, therefore, if any file to be transmitted has a
ns time, transfer it and then immediately check the received file's
timestamp.  If its ns time is still zero, then the receiving
filesystem doesn't support it, so disable ns times during the
transfer.  If it's nonzero, then enable.  (I am eliding the pipelining
that happens during an actual rsync; that may have to be dealt with
somehow.)  Also, check the directory mod time, and see if -that's- now
nonzero; you have a very small chance of it being zero if ns times are
supported, and you can check for being in or near that window.  And
the first time it's nonzero in this filesystem, you know it'll work
for everything else in this fs.*

* "This filesystem" assumes either that you can detect mountpoints,
or that the heuristic should be applied per-directory, and that no
directory has a single-file mountpoint that doesn't support it, etc.
I assume rsync must already have some sort of logic like this for
dealing with xattr support per-fs, etc.  If this is flaky to do,
then you might need --[no]ns-timing switches to force rsync to do
the right thing without complaining on every single file if it guesses
wrong.

I don't know if the rsync protocol is flexible enough to dynamically
enable or disable this capability partway through a transfer.  If it
isn't, then there's an even more hackish approach, which is to add,
and unconditionally attempt to honor, a --ns-times-valid sort of
switch.  Users can then use the heuristics above in a dummy transfer
to know whether to set that switch for the real transfer.  (Or they
may know out-of-band that their FS supports ns times.)  But I'd think
such a switch and workaround should be last resorts.

I would really like to see ns times supported.  I use dirvish to back
up filesystems, which uses rsync, and if I ever have to restore any
files from that (which I do more often due to accidentally deleting or
bashing a file than due to media failure), I lose the ns timestamps,
and they're sometimes extremely valuable forensically when I'm trying
to debug something else.  Having them be 0 when I thought they shouldn't
be has more than once cost me time until I realized that I'm looking
at files that were rsync'ed from another host (either to duplicate
a setup, or from a backup) and rsync didn't preserve the ns times.

Unfortunately, of course, if rsync gets fixed now, it -will- consider
every single backed-up file in my dirvish vaults to be "new" and will
insanely bloat the vault (doubling its size) the next time it runs,
and then I'll have to tell faster-dupemerge how to re-merge all that
stuff, too.  (After all, even if the file contents haven't changed,
its metadata has, so --link-dest is required to create a new copy
of the file rather than hardlink to one with a different timestamp.)

What I'd -really- like is for some sane interaction with --link-dest
as well (which probably requires another switch, alas), which
basically says "a change from ns-0 to ns-other with no other changes
to the file is considered the same file---update the timestamp to the
new ns time but don't break the hardlink", with a way of forcing that
off for people who aren't in my situation and do care about such a
change.  Failing that, I'd need to do something like (a) run a backup
in non-ns mode by force, then (b) immediately re-run the backup in
ns-mode -on the same output directories-, e.g., -not- using
--link-dest to create new dirvish vaults.  This should get the times
resynchronized without breaking all the hardlinks to the previous
backups.  (I suspect that this would force ns times into files dozens
of generations back in the vaults, since those hardlinks would all
share metadata, but that's okay and in fact desireable.)

Note that this change in rsync behavior would thus appear to need a
pretty big warning in the changelog and new-version announcements
warning people that those who use --link-dest (which I assume means
by-hand, via dirvish, and via rsnapshot, at least) need to make some
sort of workaround (TBD) so as not to have their backups suddenly
explode in both time and space.  I -still- think I'd like to see
ns times in rsync, despite this caveat---the longer it's delayed, the
worst the situation gets.  (A coordinated change in the most-popular
tools that use --link-dest to implement a workaround or at least
warn the user also seems wise; otherwise, those who upgrade their
OS and get a new version of rsync that way, without reading release
notes, may be surprised.  Which means such tools need a way of knowing
which rsync implements ns times, presumably by adding it to the
"Capabilities" output of --version or something.  Unless, of course,
the ns-0-to-ns-other-means-same-file-for-link-dest is the default,
which I think is what I'd recommend, as long as there's a way to
turn that off and it's well-documented.)

P.S. The current situation also means that faster-dupemerge can't use
that information, either, because I can't trust it to be correct
across hosts in such situations.  [I made a version of f-d that
respected ns times, only to abandon it when I realized that rsync
wasn't preserving them!]  I merge -across- vaults with f-d to catch
files that are the same on multiple backed-up hosts, or to catch
pushing a file from one host to another and deleting it from the
original host, or to merge identical files on same host in the backup
even if they aren't merged on the host being backed up.

[Paul Slootman's request for FAT filesystems would be a generalization
of this sort of strategy, although I'd think that in that case it's a
lot more obvious to the user invoking rsync that the fs is FAT.]