Avoiding transferring duplicate files

Jon me at jonwatson.ca
Thu Feb 14 02:03:01 GMT 2008


On Feb 12, 2008 7:53 AM, Tim Brody <tdb01r at ecs.soton.ac.uk> wrote:

> Hi All,
>
> I have a 75GB collection of data, including a lot of duplicated files,
> on a NTFS network drive. I want to backup that data across a DSL link to
> a Linux host. Currently I use cwrsync on a Windows machine to act as
> server to the Linux rsync client.
>
> I want to avoid transferring duplicated data, as the DSL link is a far
> more significant factor than computation/disk IO. I can't work out
> whether rsync (or any patch) will make it smart enough to spot duplicate
> files, regardless of file location (like fdupes or similar). Because
> this is coming off a network drive there's no way I can "hard link" (or
> NTFS equivalent) duplicates on the source tree, so it needs to happen in
> rsync.
>
> I've tried using the --detect-renamed patch on 3.0.0 in the following
> (made up) set up:
>
> src/
> src/dup
> src/dup/tardis.mp3
> src/tardis.mp3
> src/tardis2.mp3
>
> ../rsync-3.0.0pre9/rsync -avi --detect-renamed --fuzzy --checksum src/
> dest/
> building file list ... done
> .d..t...... ./
>  >f+++++++++ tardis.mp3
>  >f+++++++++ tardis2.mp3
> cd+++++++++ dup/
>  >f+++++++++ dup/tardis.mp3
>
> sent 167076 bytes
>
> Which is 3x the size of "tardis.mp3".
>
> If I remove tardis2.mp3:
>  >f+++++++++ tardis2.mp3
>
> sent 536 bytes  received 526 bytes  193.09 bytes/sec
>
> If I remove dup/tardis.mp3:
>  >f+++++++++ dup/tardis.mp3
>
> sent 55801 bytes  received 34 bytes  111670.00 bytes/sec
>
> I've found some threads about duplicate files/the bug related to the
> detect-renamed above, but nothing specifically about doing a blanket
> search for duplicates similar to fdupes.
>
> Any suggestions would be helpful.
>
> Thanks,
> Tim.


Can you run fdupes or a find command to create a file list and then feed it
to rsync via the include or exclude file list switches?

Jon
-------------- next part --------------
HTML attachment scrubbed and removed


More information about the rsync mailing list