can rsync scan files only with mtime since T?

Darryl Dixon - Winterhouse Consulting darryl.dixon at winterhouseconsulting.com
Thu Aug 23 22:21:53 GMT 2007


> Hi
>
> I have a file system that contains millions of small files. Since I
> backup it everyday with rsync using slow WAN link, I think it will be
> nice that if rsync can do this:
>
> An option that let rsync only check with remote rsync daemon about local
> files that has last modification time newer than one day ago (so is
> modified since yesterday backup). This can greatly reduce the WAN
> traffic.
>
> Is this doable with current rsync?
>

Hi Ming, List,

I thought I'd reply as I have used rsync in a similar scenario (~1TB of 13
million files in two filesystems backed up offsite).

There are a couple of approaches that will do what you want - what OS are
you using? (Windows, Solaris, Linux ...?). One is to run 'find -mtime -1 >
my_files.list' and then use the rsync --files-from=my_files.list to send
only the new files. Running find can be time consuming(!), but effectively
that's what you'd be doing with an 'rsync -mtime -1' option anyway.
Another option (and this is the one that I used) is to audit filesystem
events as they are happenining, and keep a live list of all modified files
all the time. This list can then be fed to rsync with the --files-from
option.

On Solaris this can be achieved with the BSM module and NFS logging (if
you're running and NFS server). On Linux I heavily modified pyinotify
(http://pyinotify.sourceforge.net) to achieve the same result. The outcome
is that every 5 minutes during the day I ship all the files changed in the
previous 5 minutes offsite to the backup server. This works perfectly -
and the volume of change is about 30,000 new files per day!

Email me direct if you want more details :)

regards,
Darryl Dixon
Winterhouse Consulting Ltd
http://www.winterhouseconsulting.com
darryl.dixon at winterhouseconsulting.com




More information about the rsync mailing list