data deduplication

Mag Gam magawake at gmail.com
Tue May 25 17:22:03 MDT 2010


Thanks

On Tue, May 25, 2010 at 1:26 PM, Benjamin Watkins
<ben-list at constant-technologies.com> wrote:
> On 5/25/2010 6:41 AM, Mag Gam wrote:
>>
>> I know rsync can do many things but I was wondering if anyone is using
>> it for data deduplication on a large filesystem. I have a filesystem
>> which is about 2TB and I want to make sure I don't have the same data
>> in a different place of a filesystem. Is there an algorithm for that?
>>
>
> While rsync is not an appropriate tool for this, I have successfully used
> dupseek in the past.
>
>    http://freshmeat.net/projects/dupseek/
>
> It is a perl script, so I expect you should be able to use it on any
> platform you need.  It show support for POSIX/Linux, but I expect it can run
> under Windows as well if you are comfortable with Cygwin.
>
> I'm sure there are many more tools like this.  I used this one because it
> was optimized for large files.
>
> -Ben
>
>


More information about the rsync mailing list