rsync behavior on copy-on-write filesystems

Allen Supynuk allen.supynuk at gmail.com
Tue May 21 16:06:42 MDT 2013


I have been doing some experiments with rsync on btrfs, a
copy-on-write file system that is approaching or having just achieved
production-ready status depending on your requirements.

For my purposes the reliability appears by almost all accounts to be
there, and the compression alone makes it very compelling.

However the following two experiments show rsync behaviors that are
disappointing to the point of appearing to be bugs. Certainly rsync is
more powerful if they are fixed. Of course, this is assuming that I
have not missed something in my tests.

--

Bottom line on top: rsync with --inplace appears to (wastefully)
rewrite the entire file even when only a single block or just the
meta-data (timestamp) has changed. While this is necessary behavior on
some file systems it is wasteful on copy-on-write systems.

I propose that rsync be changed to only write blocks that have changed
when --inplace is in effect. And that only meta-data be changed if the
underlying filesystem supports it.

In the experiments below the final results would have been 20BGB - 4KB
smaller had these changes been in place.

(Running rsync 3.0.9)

################################################
## Test rsync --inline

## 1) Start with an empty filesystem

$ df -h .
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/jobarchive-Ajobarchivetest2
                      300G   36M  293G   1% /vol/jobarchive_Ajobarchivetest2

## 2) Create a subvolume. Put one file with 10gb of random data in it.
##    Note: Compression is turned on, but our random data defeats it.

$ btrfs subvolume create src
$ time dd if=/dev/urandom of=src/10gb bs=4k count=2621440 conv=notrunc
2621440+0 records in
2621440+0 records out
10737418240 bytes (11 GB) copied, 811.427 s, 13.2 MB/s
0.400u 806.115s 13:31.42 99.3%  0+0k 0+20971520io 0pf+0w

$ df -h .
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/jobarchive-Ajobarchivetest2
                      300G   11G  283G   4% /vol/jobarchive_Ajobarchivetest2

## 3) Create a second subvolume called current. Copy the first file into it.

$ btrfs subvolume create current
$ time cp --archive src/* current
0.057u 17.389s 0:42.29 41.2%    0+0k 19737984+20971520io 0pf+0w

$ df -h .
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/jobarchive-Ajobarchivetest2
                      300G   20G  274G   7% /vol/jobarchive_Ajobarchivetest2

## 4) Make a snapshot of the second volume called job1. Note that it
takes up almost no space.

$ btrfs subvolume snapshot current job1
$ df -h .
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/jobarchive-Ajobarchivetest2
                      300G   21G  273G   7% /vol/jobarchive_Ajobarchivetest2

## 5) Change the first 4k bytes of the original file

$ time dd if=/dev/urandom of=src/10gb  bs=4k count=1 conv=notrunc
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.000601676 s, 6.8 MB/s
0.001u 0.001s 0:00.03 0.0%      0+0k 32+8io 1pf+0w

$ df -h .
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/jobarchive-Ajobarchivetest2
                      300G   21G  273G   7% /vol/jobarchive_Ajobarchivetest2

## 6) Use rsync --inplace to make a copy of the first file.
##    Note:
##    - We use --inplace to copy over the existing file
##    - We do not use -W aka --whole-file so the delta-xfer algorithm
should be in play
##    - The hope is that rsync will only rewrite the first block of the file

$ time /usr/share/sbtools-sbjobarchive/external-apps/rsync/rsync-3.0.9/install/centos5-64/bin/rsync
--stats -az --timeout=600 --inplace src/ current/

Number of files: 2
Number of files transferred: 1
Total file size: 10737418240 bytes
Total transferred file size: 10737418240 bytes
Literal data: 10737418240 bytes
Matched data: 0 bytes
File list size: 52
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 10742659175
Total bytes received: 34

sent 10742659175 bytes  received 34 bytes  11012464.59 bytes/sec
total size is 10737418240  speedup is 1.00
851.783u 79.265s 16:14.78 95.5% 0+0k 19752416+20971520io 17pf+0w

## 7) Alas the new file takes up a full extra 10 GB.

$ df -h .
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/jobarchive-Ajobarchivetest2
                      300G   31G  263G  11% /vol/jobarchive_Ajobarchivetest2

## Conclusion: rsync rewrote the entire file into current/10gb even
though it only needed to write the
## first 4k block. If it had we would have saved 10 GB - 4KB of disk.

################################################

## Test metadata-only change

## Start with files as above

$ btrfs subvolume snapshot current job2
Create a snapshot of 'current' in './job2'
$ df -h .
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/jobarchive-Ajobarchivetest2
                      300G   31G  263G  11% /vol/jobarchive_Ajobarchivetest2

## Change the meta-data of the first file then rsync with --inplace

$ touch src/10gb
$ time /usr/share/sbtools-sbjobarchive/external-apps/rsync/rsync-3.0.9/install/centos5-64/bin/rsync
--stats -az --timeout=600 --inplace src/ current/

Number of files: 2
Number of files transferred: 1
Total file size: 10737418240 bytes
Total transferred file size: 10737418240 bytes
Literal data: 10737418240 bytes
Matched data: 0 bytes
File list size: 52
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 10742659172
Total bytes received: 31

sent 10742659172 bytes  received 31 bytes  10620523.19 bytes/sec
total size is 10737418240  speedup is 1.00
920.122u 82.526s 16:50.71 99.2% 0+0k 20469728+20971520io 0pf+0w

$ df -h .
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/jobarchive-Ajobarchivetest2
                      300G   41G  253G  14% /vol/jobarchive_Ajobarchivetest2

## conclusion: Entire file was copied on the destination system even though only
## meta-data had changed. This is not necessary with a copy-on-write system.

-- 
Allen.Supynuk at gmail.com


More information about the rsync mailing list