data deduplication

Benjamin Watkins ben-list at constant-technologies.com
Tue May 25 07:26:23 MDT 2010


On 5/25/2010 6:41 AM, Mag Gam wrote:
> I know rsync can do many things but I was wondering if anyone is using
> it for data deduplication on a large filesystem. I have a filesystem
> which is about 2TB and I want to make sure I don't have the same data
> in a different place of a filesystem. Is there an algorithm for that?
>    

While rsync is not an appropriate tool for this, I have successfully 
used dupseek in the past.

     http://freshmeat.net/projects/dupseek/

It is a perl script, so I expect you should be able to use it on any 
platform you need.  It show support for POSIX/Linux, but I expect it can 
run under Windows as well if you are comfortable with Cygwin.

I'm sure there are many more tools like this.  I used this one because 
it was optimized for large files.

-Ben



More information about the rsync mailing list