[clug] finding duplicate sections in a text file

Brett Worth brett.worth at gmail.com
Wed Jan 22 05:42:07 UTC 2020


On 22/1/20 2:51 pm, steve jenkin via linux wrote:
> 	- Anyone have a better algorithm?
> 		E.g.  using ‘git’ or another Version Control System
> 		This is One Big File (done in sections), not many files in a directory, though I could try that next time.

Better?  Probably not.

Here's my 5 minute solution:

#!/bin/bash

INFILE=$1
WORKDIR=`mktemp -d`

split --suffix-length=8 --lines=1 -d ${INFILE} ${WORKDIR}/
fdupes -q -d -N ${WORKDIR} >/dev/null
cat ${WORKDIR}/* > $INFILE.deduped
rm -rf ${WORKDIR}


Does use a lot of files. :-)

Brett

-- 
--  /) _ _ _/_/ / / /__ _ _//
-- /_)/</= / / (_(_//_//< ///




More information about the linux mailing list