rsync efficiency

Benjamin Schweizer 2008 at benjamin-schweizer.de
Mon Sep 22 13:01:04 GMT 2008


Hi there,

there are various challenges with this: rsync would typically use the
mtime to check if a file has changed and if so, it would scan the source
file and the target file and compage checksums. Thus, both files need to
be read in whole - which affects performance. This is especially the
case if you transfer database files or virtual machines, both typically
existing of large files that are changed on block level.
You can speed this up (if you have a fast network connection) using the
-w option. This will not scan both files but instead start a new
transfer if a change is detected. This results in a linear read on the
source file and a linear write on the destination. Thus, the transfer
takes as long as a linear write takes - and not a read and parallel
write on the destination side (as indicated above).
In theory, you could speed this up of you transfer the journal from the
source to the destination. This is what is known as logshipping on
databases. However, it would require changes to the underlying block
device driver, would be limited to specific platforms and it would
resemble existing technologies that could be used in certain cases. I
think this is not the intention of rsync.
Last, if you need a block-level sync, you could take a look at drbd
which implements some of the features that I've mentioned above.


Greetings
Benjamin

On Fri, Sep 19, 2008 at 04:40:05PM -0300, Marcelo Leal wrote:
>  Hello all,
>  I have a doubt that i think you hackers of rsync has the answer. ;-)
>  I have make this post on my blog:
>  http://www.posix.brte.com.br/blog/?p=312
>  to start a serie about the copy-on-write semantics of ZFS. In my test
> "VI" did rewrite the whole file just for change 3 bytes, so the whole
> file was reallocated.
>  What i want to know from you is about the techniques used by rsync
> (and about other softwares that you know), for change a few bytes in
> the middle of a big file. Can be a simple question for you, but i
> really think how rsync can change 18k inside a 1gb file, without
> rewrite the whole file (or a lot of indirect blocks).
>  If we are talking about a SO without copy-on-write filesystem, maybe
> we can rewrite just that block (??), but in ZFS for example, if we
> have a 128K block, and we need to add 10k, that change will propagate
> to the whole tree of blocks, right?
>  And i think rsync like many softwares, create a temporary file on the
> destination, and the whole file is rewriten locally, just the changes
> over the wire. Is that right?
>  The question is: There is a efficient/safe way to change 10k of data
> in a 1gb file, whithout a lot of rewrites? rsync uses some technique
> for that, or is totally dependent on the filesystem?
> 
>  Thanks a lot!
> 
> -- 
> 
> [http://www.posix.brte.com.br/blog]
> --------==== pOSix rules ====-------
> -- 
> Please use reply-all for most replies to avoid omitting the mailing list.
> To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

-- 
http://benjamin-schweizer.de/contact


More information about the rsync mailing list