[Bug 8529] New: Extend --batch to a local cache for backups

samba-bugs at samba.org samba-bugs at samba.org
Fri Oct 14 13:59:37 MDT 2011


https://bugzilla.samba.org/show_bug.cgi?id=8529

           Summary: Extend --batch to a local cache for backups
           Product: rsync
           Version: 2.6.9
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P5
         Component: core
        AssignedTo: wayned at samba.org
        ReportedBy: samba-bugzilla at ch.pkts.ca
         QAContact: rsync-qa at samba.org


I'm backing up my computer to a server.  I've got almost a terabyte of data in 
100k of files, and they don't change much each day.  When rsync runs, it
traverses the destination directory on the server to find changes, which chews
up a lot of network bandwidth, cpu time, and disk seeks compared to the amount
of actual data to send, if any.

After reading the manual page, the --write-batch/--read-batch commands look
promising.  As I'm the only one writing to the destination directory, I could
cache the current state of the destination directory in a local file, then
generate a batch file based on that.  This would cut out 99% of the overhead
we're seeing now, reducing a 6-hour rsync to maybe 15 minutes (ie: the time to
traverse the source filesystem).

Alternatively I could do:
   find /source-directory -type f -ls | sort > new.txt
   comm -3 old.txt new.txt | gawk -F"\t" '{print $NF}' > todo.txt
   rsync -av --files-from=todo.txt /remote-dest-directory && mv new.txt old.txt
and pray there are no filenames with funky characters in them.

Is this caching feature simple enough to implement?  

Thanks!

-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.


More information about the rsync mailing list