[clug] finding duplicate sections in a text file
Brett Worth
brett.worth at gmail.com
Wed Jan 22 05:42:07 UTC 2020
On 22/1/20 2:51 pm, steve jenkin via linux wrote:
> - Anyone have a better algorithm?
> E.g. using ‘git’ or another Version Control System
> This is One Big File (done in sections), not many files in a directory, though I could try that next time.
Better? Probably not.
Here's my 5 minute solution:
#!/bin/bash
INFILE=$1
WORKDIR=`mktemp -d`
split --suffix-length=8 --lines=1 -d ${INFILE} ${WORKDIR}/
fdupes -q -d -N ${WORKDIR} >/dev/null
cat ${WORKDIR}/* > $INFILE.deduped
rm -rf ${WORKDIR}
Does use a lot of files. :-)
Brett
--
-- /) _ _ _/_/ / / /__ _ _//
-- /_)/</= / / (_(_//_//< ///
More information about the linux
mailing list