rsync through a server storing the changes, time delayed rsync

Sat Feb 25 14:48:23 GMT 2006

Quoting Matt McCutchen <hashproduct at verizon.net>:

> On Tue, 2006-02-21 at 11:28 -0500, Carson Gaspar wrote:
>> --On Tuesday, February 21, 2006 4:48 PM +0100 Torbjörn Nordling
>> <tn at s3.kth.se> wrote:
>>
>> > Problem:
>> > I have two computers (one at work and one home) and I want to keep them
>> > identical, but I cannot rsync them directly because when one is running
>> > then the second is turned off. I also have access to 500 MB of storage
>> > space on a server running continuously, which is not nearly enough to
>> > hold all data stored on the two computers. So my idea is to use the
>> > server for storing an rsync file list with checksums from the last update
>> > and then upload the data that has changed to the server.
>> >
>> > Scenario:
>> > Last thing I do at work is update the file list on the server and upload
>> > all data the rsync identify as changed. Then at home I would connect to
>> > the server and download the changed data, which is integrated by rsync in
>> > correct files. Before I turn off my home computer then I would again
>> > rsync the changes to the server. This would make the rsync delayed in
>> > time.
>> >
>> > Question:
>> > Has anyone tried something similar and could direct me towards how to
>> > make it work? Or is this impossible today, but something that quite
>> > simply could be built into rsync? Based on what I have read about how
>> > rsync works it seems feasible to me.
>>
>> What you're suggesting won't work. You have 2 options I can see:
>>
>> 1)
>> - Maintain a "last synced" timestamp on each client and the server.
>> - To sync to the server, find all files with a mtime newer that the last
>> synced time, and copy them onto the server, then update the sync time on
>> that client and the server.
>> - To sync from the server, check if its last sync time is different from
>> yours. If it is, copy the files and the timestamp from the server, then
>> delete the files from the server.
>
> This arrangement doesn't give me a good feeling; it reminds me of an
> unreliable two-way synchronization technique based on rsync that I
> proposed a long time ago (remember that, Wayne? :)).  Potential issues
> include other programs that modify mtimes and file deletion.  Torbjörn,
> it sounds like you don't need a full two-way system but a
> one-way-and-then-the-other-way system; those are much easier to do.

True, it is a one-way-and-then-the-other-way system, except when I 
forget to do the sync, so for safety it would be better to have two-way.

>> 2)
>> - Maintain 2 copies of the data on both clients (using snapshots would make
>> your life easier and storage requirements smaller)
>> - Use rsync to generate batch change files between the old and new copies,
>> and copy them onto the server
>> - download and apply pending batches on the other client
>
> That would work nicely and reliably if you have the space.
>
> Here's another technique you might consider if you have a high enough
> inode quota on the server.  Get an rsync that supports the --times-only
> quick-check behavior (e.g., apply just the times-only piece of the
> experimental filter patch) and use it.  Keep the entire file structure
> on the server with the correct attributes, but make all the files
> zero-length.  To upload, just upload; if a file is changed on your
> computer, its real data will overwrite the zero-length data on the
> server, while unchanged files will match mtimes with zero-length files
> on the server and their data won't be uploaded to take up server space.
> To download, just download; the real and newer data on the server will
> overwrite the data on your computer, while zero-length files on the
> server will match mtimes with (and not overwrite) real files on your
> computer.  Every so often, truncate the files on the server back to zero
> length after you download.
> --
> Matt McCutchen
> hashproduct at verizon.net
> http://hashproduct.metaesthetics.net/

The last sounds doable, but then it would again depend on mtimes and 
other programs messing it up, wouldn't it? Taking inspiration from the 
two copies suggestion, which unfortunately is not doable (too much data 
~40GB): 1. Store a file list with check sums locally, which is updated 
last in a sync.
2. Sync, by first comparing with local lastsync file list in order to 
detect changes, which are to be uploaded to the server. 3. Compare 
these changes with the changes stored on the server (file list 
including only check sums of changed files) in order to detect possible 
conflicts caused by same file being changed on both. 4. Download 
changes (rsync changed parts of file) from server and merge them with 
existing files. (I suppose that if a conflict occur then the only 
practical possibility is to select the local version, since only 
changed parts of file exist on server.)
5. Upload final changes that have been made to local files (including 
conflicts solved manually) and the file list including only check sums 
of changed files. 6. Update local file list of check sums.
Would this do the trick? Maintaining the strength of rsync, sending 
only changed parts of files, while localy only having to store check 
sums from last sync instead of two copies of the files.