Keep one BIG file in sync

Oliver Krause krauseo at gmx.net
Sat Feb 23 08:50:10 EST 2002


Thanks all who replied.

So to conclude: rsync does the delta stuff i need, but it writes the 
destination file completly new instead of patching the deltas.

Unfortunatly i have a offline database where i need a tool which does a 
simple compare of fixed size blocks and modifies only the changed blocks at 
the destination file inplace. So rsync doesn't do the trick for me.

If anyone know of a tool which can do that i would be glad to know, otherwise 
i think a small C programm which only does a compare and update the changed 
blocks does it because i have enough bandwidth between the two copies.

Oliver

========================

Keep in mind that rsync always generates a complete copy of the file on the
receiver side, it doesn't actually modify the file; it takes chunks out of
the original file and combines them with the changed chunks sent from the
receiver to make a new file, and when it's all done it deletes the old
one and moves the new one into place.

- Dave Dykstra


On Friday 22 February 2002 01:27, Joseph Annino wrote:
> I don't think rsync could do that.  I would think it could possibly be
> efficient about transferring files where new data kept on being appended at
> the end, if you used some tricky combo of command line switches with
> --partial and other hacks.
>
> The big problem is when diffs are usually done, you need to compare every
> bite in both files to find the deltas.  So in a network situation you
> wouldn't save any effort because everything would have to go across the
> network anyhow, so why not just copy the file?
>
> An algorithm that divided the files into chunks, calculates the CRCs for
> each chunk on both sides, compared the list, and then recursively did this
> process for each chunk that didn't match, until it reached some minimal
> chunk size where it reverted to a byte by byte diff.
>
> I would think the chunk sizes would need to be chosen carefully based on
> how the file is modified, and this method would be less efficient for files
> with a large percentage of modifications evenly distributed across a file.
>
> I'm sure being able to this kind of thing would be interesting for people
> trying to keep server farms of read only databases going.  A highly
> scalable search engine would be such an application.  Although in that case
> you might want to keep a copy of the old file around to compare locally for
> diffs, and then send the diff files out to all the servers, since they are
> all in the same known state.
>
> Anyhow, its an interesting problem but I know of nothing out there that
> works this way.
>
> On 2/21/02 4:37 PM, "Oliver Krause" <krauseo at gmx.net> wrote:
> > Hi,
> >
> > after some searching i didn't came up with an answer so please excuse if
> > this is a total newbie question.
> >
> > My problem:
> > I have server A which has a big (>500G) database like file. On server B i
> > want to have a copy of this file which i don't want to copy each time but
> > sync the deltas so that only the deltas are written once a day. Bandwidth
> > between A and B isn't the problem. The sync should be as fast as
> > possible.
> >
> > So i want to achive somethink like a binary diff -u and patch.
> >
> > Is rsync the right tool for this? Or will the rsync mechanism create too
> > much overhead to sync this file? Writing only the deltas is key for me.
> > Is there a better method?
> >
> > Thanks ... Oliver




More information about the rsync mailing list