data deduplication
Benjamin Watkins
ben-list at constant-technologies.com
Tue May 25 07:26:23 MDT 2010
On 5/25/2010 6:41 AM, Mag Gam wrote:
> I know rsync can do many things but I was wondering if anyone is using
> it for data deduplication on a large filesystem. I have a filesystem
> which is about 2TB and I want to make sure I don't have the same data
> in a different place of a filesystem. Is there an algorithm for that?
>
While rsync is not an appropriate tool for this, I have successfully
used dupseek in the past.
http://freshmeat.net/projects/dupseek/
It is a perl script, so I expect you should be able to use it on any
platform you need. It show support for POSIX/Linux, but I expect it can
run under Windows as well if you are comfortable with Cygwin.
I'm sure there are many more tools like this. I used this one because
it was optimized for large files.
-Ben
More information about the rsync
mailing list