[Bug 9041] New: Feature request: Better handling of btrfs based sources

samba-bugs at samba.org samba-bugs at samba.org
Fri Jul 13 15:58:45 MDT 2012


https://bugzilla.samba.org/show_bug.cgi?id=9041

           Summary: Feature request: Better handling of btrfs based
                    sources
           Product: rsync
           Version: 3.1.0
          Platform: All
        OS/Version: Linux
            Status: NEW
          Severity: enhancement
          Priority: P5
         Component: core
        AssignedTo: wayned at samba.org
        ReportedBy: rsync at sanitarium.net
         QAContact: rsync-qa at samba.org


Rsync currently does not handle complex btrfs systems in an efficient way.

Btrfs has subvolumes and snapshots of those subvolumes which appear like simple
subdirectories (they don't show up as mount points).  One potential use case
for this is to have production and development versions of a tree stored in the
same filesystem using CoW to save space.

This introduces the possibility of the same file (with or without modification)
appearing multiple places within the filesystem using the same inode number but
not increasing the link count like a hard link would.

Here is a simple example using 2 small btrfs filesystems on a Gentoo Linux
system with current btrfs tools...

# I start with 2 empty filesystems...
> df -hT /test/*/
Filesystem                Type   Size  Used Avail Use% Mounted on
/dev/mapper/vg-test_btrfs btrfs  1.0G   56K  382M   1% /test/btrfs
/dev/mapper/vg-test_rsync btrfs  1.0G   56K  382M   1% /test/rsync

# I create a subvolume within the btrfs filesystem
> btrfs sub create btrfs/current
Create subvolume 'btrfs/current'

# I copy 2 mp3 files into the subvolume
> cp -v *.mp3 btrfs/current/
`ChangeMe.mp3' -> `btrfs/current/ChangeMe.mp3'
`NoTouching.mp3' -> `btrfs/current/NoTouching.mp3'

# Now I snapshot that subvolume
> btrfs sub snapshot btrfs/current btrfs/old
Create a snapshot of 'btrfs/current' in 'btrfs/old'

# As you can see there are now two subvolumes that contain the exact same 2
# files with the same inode numbers:
> ls -li btrfs/*/*.mp3
257 -rw-r----- 1 root root 62060544 Jul 13 16:55 btrfs/current/ChangeMe.mp3
258 -rw-r----- 1 root root 46897152 Jul 13 16:59 btrfs/current/NoTouching.mp3
257 -rw-r----- 1 root root 62060544 Jul 13 16:55 btrfs/old/ChangeMe.mp3
258 -rw-r----- 1 root root 46897152 Jul 13 16:59 btrfs/old/NoTouching.mp3

# Now I change one of the files in one of the subvolumes
> id3v2 -D btrfs/current/ChangeMe.mp3 
Stripping id3 tag in "btrfs/current/ChangeMe.mp3"...id3v1 and v2 stripped.

# Now, the inode numbers are still the same but the ChangeMe.mp3 file now has
# an updateded mtime and a different file size despite still having the same
# inode number.
> ls -li btrfs/*/*.mp3
257 -rw-r----- 1 root root 62060157 Jul 13 17:01 btrfs/current/ChangeMe.mp3
258 -rw-r----- 1 root root 46897152 Jul 13 16:59 btrfs/current/NoTouching.mp3
257 -rw-r----- 1 root root 62060544 Jul 13 16:55 btrfs/old/ChangeMe.mp3
258 -rw-r----- 1 root root 46897152 Jul 13 16:59 btrfs/old/NoTouching.mp3

# Now I rsync the whole thing...
> rsync -vaihhH --stats btrfs/ rsync/
sending incremental file list
.d..tp..... ./
cd+++++++++ current/
>f+++++++++ current/ChangeMe.mp3
>f+++++++++ current/NoTouching.mp3
cd+++++++++ old/
>f+++++++++ old/ChangeMe.mp3
>f+++++++++ old/NoTouching.mp3

Number of files: 7
Number of files transferred: 4
Total file size: 207.82M bytes
Total transferred file size: 207.82M bytes
Literal data: 207.82M bytes
Matched data: 0 bytes
File list size: 161
File list generation time: 0.004 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 207.85M
Total bytes received: 99

sent 207.85M bytes  received 99 bytes  59.38M bytes/sec
total size is 207.82M  speedup is 1.00

# As you can see rsync sees 4 completely different files.  It has no idea that
# NoTouching.mp3 is the same file even though they have the same inode number.
# The use of -H doesn't matter because they aren't hard links so the link
# count is only 1.
> ls -li rsync/*/*.mp3
259 -rw-r----- 1 root root 62060157 Jul 13 17:13 rsync/current/ChangeMe.mp3
260 -rw-r----- 1 root root 46897152 Jul 13 17:12 rsync/current/NoTouching.mp3
261 -rw-r----- 1 root root 62060544 Jul 13 17:12 rsync/old/ChangeMe.mp3
262 -rw-r----- 1 root root 46897152 Jul 13 17:12 rsync/old/NoTouching.mp3

# Disk space usage is increased accordingly:
> df -hT /test/*/
Filesystem                Type   Size  Used Avail Use% Mounted on
/dev/mapper/vg-test_btrfs btrfs  1.0G  164M  219M  43% /test/btrfs
/dev/mapper/vg-test_rsync btrfs  1.0G  209M  174M  55% /test/rsync

Note that I am not asking for rsync to duplicate the subvolume or snapshot
functionality.  Just recognize that the same file exists in multiple locations
kind of like a hard link but not.

It seems to me the quickest way to accomplish this would be to add an option
that works kind of like --hard-links except that it remembers all the
file<>inode number pairings instead of just the ones with link count >1. Then,
when it finds a file with the same inode number instead of writing out a new
file it would use the new clone ioctl (like cp --reflink does) to make a
duplicate file without consuming any additional disk space.  After that it
would then do the standard mtime check to see if a delta-xfer is needed to
update that cloned file.

-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.


More information about the rsync mailing list