Patch to make rsync preserve access times

Gilbert E. Detillieux gedetil at cs.umanitoba.ca
Sat Sep 1 09:36:15 EST 2001


Bradley, and the rsync development team,

I came across the following message on the rsync bug tracking system, while
searching for just this feature (access time preservation)...

http://rsync.samba.org/cgi-bin/rsync/incoming?id=2509;expression=atime;user=guest

... where it is written...
> Date: Mon, 7 Feb 2000 02:59:42 -0500
> From: "Bradley M. Kuhn" <bkuhn at ebb.org>
> To: rsync-bugs at samba.org, rsync at samba.org
> Subject: Patch to make rsync preserve access times
> 
> The patch included below allows rsync to preserve access times.  The patch
> is not as clean as it should be, however, I made the patch so that I changed
> as little of the rsync internals as possible.
> 
> Please let me know if this patch is acceptable, and if it isn't, please let
> me know how I might better approach the problem.
> 
> I think that it is worthwhile to add access time preservation to rsync.

I wholeheartedly agree!  In fact, I've got a mirroring task for which rsync
would be the perfect solution iff (if and only if) it can preserve access
times.  I've also seen messages from a few others requesting this feature,
but this patch was the only one I've seen that attempts to implement it.

Since this message was from a year and a half ago, I was just wondering what
the current status of the request is?  (I noticed it's still filed under
incoming!)  Are there plans to support this feature?  (Pretty please?!)

Since it introduces a non-standard protocol extension, it would be nice to
see it officially supported, rather than having to break compatibility with
those using existing (and, more to the point, future) supported versions.

There are some minor bugs with the patch as is, however, including one with
some fairly major implications.  I'll comment on the applicable parts of the
patch below...  (However, since the time this patch had been written, you
may have improved upon it, in which case I'd be very interested to see
something newer and cleaner.)

> *** ./flist.c	Wed Jan 26 23:53:39 2000
> --- ./flist.c	Mon Feb  7 00:52:17 2000
...
> ***************
> *** 201,206 ****
> --- 203,209 ----
>   	if (file->uid == last_uid) flags |= SAME_UID;
>   	if (file->gid == last_gid) flags |= SAME_GID;
>   	if (file->modtime == last_time) flags |= SAME_TIME;
> + 	if (file->accesstime == last_access_time) flags |= SAME_ACCESS_TIME;
>   
>   	for (l1=0;lastname[l1] && (fname[l1] == lastname[l1]) && (l1 < 255);l1++) ;  
>   	l2 = strlen(fname) - l1;

Setting this additional flag bit, SAME_ACCESS_TIME, should probably only be
done if (preserve_access_times && remote_version >= 25), to ensure maximum
compatibility with earlier protocol versions that won't expect the extra bit.
Also, it would minimize the impact of this change when the feature is not
explicitly requested.  I.e...

+	if (preserve_access_times && remote_version >= 25 && file->accesstime == last_access_time) flags |= SAME_ACCESS_TIME;

There's also a much more important issue with respect to this flag (below).

> *** ./rsync.c	Mon Jan 24 04:13:39 2000
> --- ./rsync.c	Sun Feb  6 21:01:26 2000
...
> ***************
> *** 162,181 ****
>   		st = &st2;
>   	}
>   
> - 	if (preserve_times && !S_ISLNK(st->st_mode) &&
> - 	    st->st_mtime != file->modtime) {
> - 		/* don't complain about not setting times on directories
> - 		   because some filesystems can't do it */
> - 		if (set_modtime(fname,file->modtime) != 0 &&
> - 		    !S_ISDIR(st->st_mode)) {
> - 			rprintf(FERROR,"failed to set times on %s : %s\n",
> - 				fname,strerror(errno));
> - 			return 0;
> - 		} else {
> - 			updated = 1;
> - 		}
> - 	}
> - 
>   	change_uid = am_root && preserve_uid && st->st_uid != file->uid;
>   	change_gid = preserve_gid && file->gid != (gid_t) -1 && \
>   				st->st_gid != file->gid;
> --- 166,171 ----
> ***************
> *** 215,220 ****
> --- 205,233 ----
>   	}
>   #endif
>       
> + 	/* If we don't have a symbolic link, then we have to see if we need
> + 	** to update either the modification time, or the access time,
> + 	** depending on the user-given flags and whether the given parameter
> + 	** is actually different.  We also do this last in case the chown()
> + 	** effects access times.
> + 	*/
> + 	if (!S_ISLNK(st->st_mode) &&
> + 		( (preserve_times && st->st_mtime != file->modtime) ||
> + 		  (preserve_access_times && st->st_atime != file->accesstime)))
> + 	{
> + 		/* don't complain about not setting times on directories
> + 		   because some filesystems can't do it */
> + 		if (set_modtime_and_accesstime(fname,file->modtime,
> + 			file->accesstime) != 0 &&
> + 		    !S_ISDIR(st->st_mode)) {
> + 			rprintf(FERROR,"failed to set times on %s : %s\n",
> + 				fname,strerror(errno));
> + 			return 0;
> + 		} else {
> + 			updated = 1;
> + 		}
> + 	}
> + 
>   	if (verbose > 1 && report) {
>   		if (updated)
>   			rprintf(FINFO,"%s\n",fname);

The above gets rejected by patch when applied against 2.4.6 source, since
the modtime comparison is now down via a function called cmp_modtime().
A slight change to that hunk fixes the problem...

-----------------------------------------------------------------------------
***************
*** 162,181 ****
  		st = &st2;
  	}
  
- 	if (preserve_times && !S_ISLNK(st->st_mode) &&
- 	    cmp_modtime(st->st_mtime, file->modtime) != 0) {
- 		/* don't complain about not setting times on directories
- 		   because some filesystems can't do it */
- 		if (set_modtime(fname,file->modtime) != 0 &&
- 		    !S_ISDIR(st->st_mode)) {
- 			rprintf(FERROR,"failed to set times on %s : %s\n",
- 				fname,strerror(errno));
- 			return 0;
- 		} else {
- 			updated = 1;
- 		}
- 	}
- 
  	change_uid = am_root && preserve_uid && st->st_uid != file->uid;
  	change_gid = preserve_gid && file->gid != (gid_t) -1 && \
  				st->st_gid != file->gid;
--- 166,171 ----
***************
*** 215,220 ****
--- 205,233 ----
  	}
  #endif
      
+ 	/* If we don't have a symbolic link, then we have to see if we need
+ 	** to update either the modification time, or the access time,
+ 	** depending on the user-given flags and whether the given parameter
+ 	** is actually different.  We also do this last in case the chown()
+ 	** effects access times.
+ 	*/
+ 	if (!S_ISLNK(st->st_mode) &&
+ 		( (preserve_times && cmp_modtime(st->st_mtime, file->modtime) != 0) ||
+ 		  (preserve_access_times && cmp_modtime(st->st_atime, file->accesstime) != 0)))
+ 	{	/* not sure if we should call cmp_modtime with access time */
+ 		/* don't complain about not setting times on directories
+ 		   because some filesystems can't do it */
+ 		if (set_modtime_and_accesstime(fname,file->modtime,
+ 			file->accesstime) != 0 &&
+ 		    !S_ISDIR(st->st_mode)) {
+ 			rprintf(FERROR,"failed to set times on %s : %s\n",
+ 				fname,strerror(errno));
+ 			return 0;
+ 		} else {
+ 			updated = 1;
+ 		}
+ 	}
+ 
  	if (verbose > 1 && report) {
  		if (updated)
  			rprintf(FINFO,"%s\n",fname);
-----------------------------------------------------------------------------

Now, on to the nasty bit of code that caused me some grief and head
scratching...

> *** ./rsync.h	Sat Jan 29 18:49:36 2000
> --- ./rsync.h	Sun Feb  6 20:19:53 2000
> ***************
> *** 45,53 ****
>   #define SAME_NAME SAME_DIR
>   #define LONG_NAME (1<<6)
>   #define SAME_TIME (1<<7)
>   
>   /* update this if you make incompatible changes */
> ! #define PROTOCOL_VERSION 24
>   #define MIN_PROTOCOL_VERSION 15
>   #define MAX_PROTOCOL_VERSION 30
>   
> --- 45,58 ----
>   #define SAME_NAME SAME_DIR
>   #define LONG_NAME (1<<6)
>   #define SAME_TIME (1<<7)
> + #define SAME_ACCESS_TIME SAME_TIME
> + /* It would be nice if SAME_ACCESS_TIME could be seperate from SAME_TIME,
> + ** since access times change more frequently than modification times, however
> + ** there is no room left in the flag byte for access time to have its own flag.
> + */
>   
>   /* update this if you make incompatible changes */
> ! #define PROTOCOL_VERSION 25
>   #define MIN_PROTOCOL_VERSION 15
>   #define MAX_PROTOCOL_VERSION 30
>   

A good idea, in theory, but SAME_TIME and SAME_ACCESS_TIME cannot be shared!
It's unfortunate that we've run out of bits in the flag byte, but we really
need two separate bits for this to work at all.  This would imply adding a
second byte of flag bits which would be mandatory for protocol versions 25
and up, and excluded for lesser versions (to retain compatibility).

The reason the bits can't be shared is that anytime EITHER the mod time OR
the access time of two consecutive files in the list are the same, it tells
the remote side that BOTH are the same, which is often not the case.  File
access times for many consecutive files will be the same, for example, if
they were all read in rapid succession by a given command; however, their
mod times could be (and often will be) vastly different.  This could result
in a target file list that's much larger than necessary, but could also
result in files that should be in the list being missed.  I.e. one or the
other of the timestamps will end up being bogus on the remote system, which
kind of screws up the comparisons, not to mention the resulting timestamps.

The problem was also made worse by the fact that the SAME_ACCESS_TIME flag
was being set anytime access times of consecutive files were the same,
rather than just if the --access-times option was also set.

My quick-hack solution was to just set SAME_ACCESS_TIME to 0, effectively
making it impossible for that flag to ever be set.  This works around the
bug introduced above, but there's a slight performance penalty in the
transfer of the file list, since you'll be transfering the (4 byte) access
time for each file every time (as long as the --access-times option is set
and the protocol versions support it), regardless of whether or not it's the
same as that of the previous file.

The better solution, as suggested above, would be to add another flag byte.

Anyway, thanks for the useful patch - I certainly didn't have the time to go
grubbing through all the rsync code myself to figure out how to implement
this on my own.  I hope you find my suggestions useful.  (Sorry about the
length of the e-mail, though.)

I would be interested to hear back from you as to the current status of this
feature request (which I really hope gets included), and would appreciate
any feedback you've got on my suggestions or on any work you've done on this
code since February 2000.

Cheers!

-- 
Gilbert E. Detillieux		E-mail:	<gedetil at cs.umanitoba.ca>
Dept. of Computer Science	Web:	http://www.cs.umanitoba.ca/~gedetil/
University of Manitoba		Phone:	(204)474-8161
Winnipeg, MB, CANADA  R3T 2N2	Fax:	(204)474-7609
"Cautionary tales don't end with 'It was SO COOL!'" - Malcolm in the Middle




More information about the rsync mailing list