Keep one BIG file in sync

Joseph Annino jannino at jannino.com
Fri Feb 22 11:27:20 EST 2002


I don't think rsync could do that.  I would think it could possibly be
efficient about transferring files where new data kept on being appended at
the end, if you used some tricky combo of command line switches with
--partial and other hacks.

The big problem is when diffs are usually done, you need to compare every
bite in both files to find the deltas.  So in a network situation you
wouldn't save any effort because everything would have to go across the
network anyhow, so why not just copy the file?

An algorithm that divided the files into chunks, calculates the CRCs for
each chunk on both sides, compared the list, and then recursively did this
process for each chunk that didn't match, until it reached some minimal
chunk size where it reverted to a byte by byte diff.

I would think the chunk sizes would need to be chosen carefully based on how
the file is modified, and this method would be less efficient for files with
a large percentage of modifications evenly distributed across a file.

I'm sure being able to this kind of thing would be interesting for people
trying to keep server farms of read only databases going.  A highly scalable
search engine would be such an application.  Although in that case you might
want to keep a copy of the old file around to compare locally for diffs, and
then send the diff files out to all the servers, since they are all in the
same known state.

Anyhow, its an interesting problem but I know of nothing out there that
works this way.

On 2/21/02 4:37 PM, "Oliver Krause" <krauseo at gmx.net> wrote:

> Hi,
> 
> after some searching i didn't came up with an answer so please excuse if this
> is a total newbie question.
> 
> My problem:
> I have server A which has a big (>500G) database like file. On server B i
> want to have a copy of this file which i don't want to copy each time but
> sync the deltas so that only the deltas are written once a day. Bandwidth
> between A and B isn't the problem. The sync should be as fast as possible.
> 
> So i want to achive somethink like a binary diff -u and patch.
> 
> Is rsync the right tool for this? Or will the rsync mechanism create too much
> overhead to sync this file? Writing only the deltas is key for me. Is there a
> better method?
> 
> Thanks ... Oliver





More information about the rsync mailing list