rsync - using a --files-from list to cut out scanning. How to handle deletions?
kmk at sanitarium.net
Tue Jan 15 07:25:05 MST 2013
-----BEGIN PGP SIGNED MESSAGE-----
If you are going to do it this way please be aware of:
If a file exists in the target directory when using --link-dest rsync
modifies the link rather than replacing it which means you don't have
history for files that have been replaced rather than added or deleted.
If you are dealing with backing up many millions of files then I
suggest looking into a more advanced filesystem that can handle this
functionality internally rather than using --link-dest. Currently
that is limited to ZFS or BTRFS (if you are brave).
Both of these filesystems have subvolumes and subvolume snapshot
capabilities. This means you can do something similar to an lvm2
snapshot at the directory level instead of the whole filesystem. You
can rsync with the same target directory each run and do a snapshot of
that target between runs. The recycling concept is not needed because
deleting an old snapshot is much faster than doing an rm -rf on a huge
tree of hard links. This is especially true on ZFS which usually does
the job in <1 second regardless of size. Unfortunately BTRFS usually
completes the command quickly but the space is then slowly reclaimed
by a kernel thread in the background.
Here is something I wrote up about it a while back:
It is a little out of date now and since I wrote it for a LUG it only
covers BTRFS. A FreeBSD 9 system with at least 8GB of RAM running ZFS
will outperform pretty much any Linux system running BTRFS (currently)
which will outperform any Linux system running ext4 and --link-dest.
On 01/14/13 22:45, Robert Bell wrote:
> We use rsync extensively for protecting data by making backups.
> Thank you to the authors and maintainers.
> Like many others, we use the --link-dest option to cut down on the
> space occupied by the backups.
> Unlike many others, we re-cycle old backup directories. Since most
> file systems change only slowly (ours average about 0.5% of files
> and about 1.5% of data being churned each day), a recycled
> directory is a good start for the next backup. Our most common
> case is that a directory from 5 days ago becomes the target for the
> current backup, with the yesterday's backup being provided by a
> --link-dest= setting.
> Since the source file system changes only slowly, I have been
> thinking about ways to speed up the backups in the future. One way
> is to have the backups deal only with files that have changed on
> the source since the last backup. This would save having to scan
> the whole source and destination areas each time a backup is done.
> The Linux inotify capability looks like it might be useful for
> collecting a list of changed files.
> Has anyone done this?
> However, there is one case that I have not been able to get to work
> in a test of rsync. This is the case where a file exists in the
> destination, does not exist in the source, but is named in the
> --files-from= list. This would be the case if a file had been
> deleted from the source. We would want rsync in this case to
> delete the file on the destination.
> However, with a test command like:
> rsync -a -i --delete --files-from=list --link-dest=../linked
> source/ dest
> I was unable to get rsync to delete on the destination a file which
> did not exist in the source but was named in the list. rsync
> baulked at a file being listed that was not in the source. For
> rsync: link_stat "/data/flush/inter/bel107.80527/source/0yyy"
> failed: No such file or directory (2)
> [The test file 0yyy existed in the destination, the link-dest area
> and in the list, but not in the source.]
> Thanks to those who have read down to here. :-)
> Regards Rob. Bell e-mail: Robert.Bell at csiro.au -- Dr
> Robert C. Bell, BSc (Hons) PhD Technical Services Manager Advanced
> Scientific Computing CSIRO IM&T
> Phone: +61 3 9669 8102 | Mobile: +61 428 108 333 | CSIRO 93 3810
> Robert.Bell at csiro.au | http://www.csiro.au/ |
> http://www.hpsc.csiro.au/ Addresses: Street: CSIRO ASC Level 11,
> 700 Collins Street, Docklands Vic 3008, Australia Postal: CSIRO ASC
> Level 11, GPO Box 1289, Melbourne Vic 3001, Australia
> PLEASE NOTE
> The information contained in this email may be confidential or
> privileged. Any unauthorised use or disclosure is prohibited. If
> you have received this email in error, please delete it immediately
> and notify the sender by return email. Thank you. To the extent
> permitted by law, CSIRO does not represent, warrant and/or
> guarantee that the integrity of this communication has been
> maintained or that the communication is free of errors, virus,
> interception or interference.
> Please consider the environment before printing this email.
Kevin Korb Phone: (407) 252-6853
Systems Administrator Internet:
FutureQuest, Inc. Kevin at FutureQuest.net (work)
Orlando, Florida kmk at sanitarium.net (personal)
Web page: http://www.sanitarium.net/
PGP public key available on web site.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
-----END PGP SIGNATURE-----
More information about the rsync