[clug] some bash/awk scripting. use MD5's (m5sum) to find and collapse duplicate files
steve jenkin
sjenkin at canb.auug.org.au
Mon Jan 23 21:54:57 MST 2012
I've needed to create this for myself, others may find it useful.
There may well be better and more well known solutions out there...
I didn't look.
The PERL mavens on-list may like to rewrite it.
Just the sort of job Larry invented it for :-)
<http://members.tip.net.au/~sjenkin/code/dedup.tar>
The README:
Dedup.tar
Steve Jenkin Tue 24 Jan 2012 15:30:54 EST
mailto:sjenkin at tip.net.au
http://members.tip.net.au/~sjenkin/code/dedup.tar
## A set of scripts that use MD5's to find duplicate files
and then collapse duplicates with 'hard links'. [see POSIX filesystem
semantics]
The 'gen_rm_ln' script is awk [#!/usr/bin/awk]
The others use /bin/bash.
You'll need 'rm_ln' in your path.
These scripts are written to test for many errors and 'be safe',
BUT - caveat emptor. I'm sure there are edge cases I've missed.
WARNING: The scripts aren't "security hardened" - they could be abused
by hackers.
If you are *brave* then this pipeline will work:
md5_dupfl <dir> | gen_rm_ln | xargs -L 1 bash
(where <dir> is the directory/directories you want to dedup)
For options, see
md5_dupfl -h
When I run this with xargs in OS/X, I get an error related to /bin/echo...
but it does the work.
What I use:
md5_dupfl Nokia_Rigntones | gen_rm_ln >dedup.sh
<inspect flist, rerun etc>
sh dedup.sh
## Contents of tar file
-rw-r--r-- steve/steve 1279 2012-01-24 15:48 README
-rwxr-xr-x steve/steve 2118 2012-01-24 15:44 bin/md5_dupfl
-rwxr-xr-x steve/steve 1628 2012-01-24 13:30 bin/gen_rm_ln
-rwxr-xr-x steve/steve 3638 2012-01-21 19:13 bin/rm_ln
--
Steve Jenkin, Info Tech, Systems and Design Specialist.
0412 786 915 (+61 412 786 915)
PO Box 48, Kippax ACT 2615, AUSTRALIA
sjenkin at canb.auug.org.au http://members.tip.net.au/~sjenkin
More information about the linux
mailing list