failed to set times on ... Invalid argument (22) and what to do with it

Georgy Fedorov gfedorov at unimelb.edu.au
Sun Apr 9 09:23:12 UTC 2017


Dear All,

Along with the files that suddenly disappear, we have a bit of a problem 
with these that do not.
Namely, in my test runs I can see a small but stable set of files, that 
rsync is repeatedly trying to transfer, and then repeatedly fails to 
updates their times ; and then the story repeats itself.

The situation is illustrated by the log snippet below, where I have 
changed file names to protect owners privacy. In the tests, I am using 
rsync 3.1.1 since this is what is presently supported by the OS release 
maintainers, and I was trying to avoid any new issues if possible. But 
in my understanding, rsync 3.1.2 would have had the same problem here.

This issue is in many ways similar to bug 
https://bugzilla.samba.org/show_bug.cgi?id=4977 and the like, and in 
this particular case is caused by files whose mtime nanoseconds exist on 
the filesystem and are broken. However, since there is a workaround to 
bypass a similar issue with symlinks, I believe we can do something 
about such files as well: with just a few broken timestamps on a large 
filesystem, it is very desirable to be able to avoid or suppress code 23 
('partial transfer') when all the file data and 99.9% of the timestamps 
were transferred properly.

After the log snippets below, I will try to explain the problem as I see 
it, and then suggest two different ways to address it: basically ignore 
error 22 (EINVAL) of fix nanosecond mtime .

===
...
2017/04/06 00:16:06 [16000] <f..tp..... filename1
2017/04/06 00:16:06 [16000] <f..tp..... filename2
2017/04/06 00:16:06 [16000] rsync: failed to set times on 
"filename1.ezq12g" (in module): Invalid argument (22)
2017/04/06 00:16:06 [16000] rsync: failed to set times on 
"filename2.wp5Aju" (in module): Invalid argument (22)
2017/04/06 00:16:09 [16000] <f..t...... filename3
2017/04/06 00:16:09 [16000] rsync: failed to set times on 
"filename3.8Fa3LO" (in module): Invalid argument (22)
2017/04/06 00:16:09 [16000] .d..t...... dir1/
2017/04/06 00:16:09 [16000] <f..t...... filename4
2017/04/06 00:16:09 [16000] <f.st...... filename5
2017/04/06 00:16:09 [16000] .d..t...... dir2/
2017/04/06 00:16:09 [16000] <f..t...... filename6
2017/04/06 00:16:09 [16000] .d..t...... dir3/
2017/04/06 00:16:09 [16000] <f.st...... dir3/filename7
2017/04/06 00:16:10 [16000] sent 161,021,998 bytes  received 1,858,287 
bytes  868,694.85 bytes/sec
2017/04/06 00:16:10 [16000] total size is 1,228,368,779,326  speedup is 
7,541.54
2017/04/06 00:16:10 [16000] rsync error: some files/attrs were not 
transferred (see previous errors) (code 23) at main.c(1183) [sender=3.1.1]

===

With some extra verbosity, the logs give some insight:

===
2017/04/07 18:44:41 [27597] recv_files(filename1)
2017/04/07 18:44:41 [27597] got file_sum
2017/04/07 18:44:41 [27597] set uid of filename1.ZjS8YL from 0 to *****
2017/04/07 18:44:41 [27597] set gid of filename1.ZjS8YL from 0 to *****
2017/04/07 18:44:41 [27597] set modtime of filename1.ZjS8YL to 
(370409472) Sun Sep 27 03:31:12 1981
2017/04/07 18:44:41 [27597] rsync: failed to set times on 
"filename1.ZjS8YL" (in users): Invalid argument (22)
2017/04/07 18:44:41 [27597] renaming filename1.ZjS8YL to filename1
===

Both 'cp -a filename1 somewhere_else' and 'touch -r filename1 
something_else' fail as well, and in a bug tracker we can stop here with 
a WONTFIX.

However, copying times with stat() / utime() from filename1 to its 
destination fixes all its time transfer problems once and for all, so 
certainly something can be done here.

Now I will have to apologize for the length of the letter -- and start 
discussing possible solutions.

(a) First of all, we can _fix_ the time on destination. ( I do not _own_ 
the files in question; I'm just trying to make sure the data is 
replicated from one location to another -- ideally, as accurately as 
possible. )

Debugging the GNU/touch utility code, I can see that for all these files 
.tv_nsec field of the timespec structure gets bogus values: either 
negative, or far over 1000000000 (one second), and in my opinion this 
eventually causes utimensat() / futimens() to fail .

To remind,  struct timespec is defined as follows:

===
            struct timespec {
                time_t tv_sec;        /* seconds */
                long   tv_nsec;       /* nanoseconds */
            };
===

Apart from a terrible idea to define a disk-related data field being 
dependent on the CPU word size ( both time_t and long have the word size 
on GCC ), there is an obvious problem here: while tv_sec field, by 
definition, _can_not_ have a "wrong" value (e.g. negative numbers here 
may look weird, but they are not, per se, _wrong_) -- tv_nsec field 
actually can have an unacceptable value.

First of all, it is not expected to be negative: imho on Linux 
UTIME_OMIT is (-2), that is, we ban negative nanoseconds.
Next, although the standard never says so, it is apparently not 
/expected/ for these values to be greater than one second: for 32-bit 
longs, this is not a big deal, since a couple of extra seconds may just 
lie within very common time jitter. For 64-bit longs, however, one could 
stuff the whole Unix timestamp in that field, so we would probably want 
to ban that as well for the sake of general sanity.

Now back to rsync: looking at set_modtime() definition in util.c, one 
can clearly see that rsync developers take a very sane and reasonable 
approach by declaring mod_nsec as an uint32 -- that is, both unsigned, 
and having a byte-fixed reasonable size. ( This is imho how tv_nsec 
field should have been declared! )

Furthermore, rsync 3.1.1 -- see set_file_attrs() in rsync.c -- compares 
only mtime _seconds_, and takes no attempt to adjust mtimes further if 
the "classic" .st_mtime values coincide. ( And this is why all the  
"Invalid argument (22)" problems disappear after we transfer .st_mtime 
values by an utime() call.  )

Therefore, still for rsync 3.1.1 -- first, any bogus negative value for 
.st_mtim.tv_nsec will be silently converted to a uint32 ; and second, 
any bogus value which is far over one second will be truncated.

I'll discuss version 3.1.2 in a moment, but it is rather tempting to 
agree that if we discard bogus nanosecond .st_mtim.tv_nsec values and 
set them to 0, we will in fact transfer all the same data and copy the 
timestamps as close as we can, and moreover -- will not attempt to 
transfer these files again.

This is a proposed patch to do the above: 
https://gist.github.com/anonymous/1eb789c825a8eaa1bdb4116334ba52da .

Version 3.1.2 takes a slightly more aggressive approach, and tries to 
compare the nanosecond part of the timestamp as well. However, since 
--modify-window parameter is still supported, in my understanding this 
happens only if the file was originally nominated for a transfer, so 
that the selection is still controlled by cmp_time() from util.c ; in 
other words, if the destination has the same value of .st_mtime field as 
the source, and the same contents etc -- no data transfer would be 
attempted.

One could possibly recompile rsync, telling it to ignore all nanosecond 
values whatsoever, but this is not exactly what we want here: it is 
rather desirable to transfer the existing terabytes of data as 
accurately as possible, omitting just a few files whose times were 
broken anyway.

At the moment I can not determine what makes these files to appear in 
the first place, but the fact is that they exist, and although are very 
small in numbers, can be very annoying.

(b) Another option is to allow rsync to ignore setting mtime failure, 
and to return RERR_OK irregardless of that.

The downside is that rsync will keep retrying to resend this data over 
and over again. The plus side, however, is that setting mtimes is a very 
complex task that can go wrong in a number of different ways on a number 
of platforms, so it may be desirable to let the user ignore this error 
in the case if the bulk of the data was transferred correctly.

In that case, we will probably need another rsync option, something like 
--ignore-missing-mtimes . Since choosing a proper name is a whole 
another problem, I am adding here a "wrong" patch that does not 
introduce extra rsync options, but rather extends 
'--ignore-missing-args' functionality to ignore mtime setting errors as 
well :

https://gist.github.com/anonymous/e6c95dc424837e09015104b1d5a2a59e

This patch works for both rsync 3.1.1 and 3.1.2, since the addressed 
code did not change much between the versions.

I sincerely hope that rsync can address the nanosecond mtimes issue in 
one way or another, since apparently it arises from time to time and 
when it does, can be very annoying.

Sincerely,
George Fedorov

-- 

George Fedorov
Senior Systems Specialist
Melbourne School of Engineering
The University of Melbourne, Victoria 3010, Australia
phone: *****
email: *****
http://www.eng.unimelb.edu.au/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.samba.org/pipermail/rsync/attachments/20170409/4865296d/attachment.html>


More information about the rsync mailing list