[clug] finding duplicate sections in a text file

Wed Jan 22 05:42:07 UTC 2020

On 22/1/20 2:51 pm, steve jenkin via linux wrote:
> 	- Anyone have a better algorithm?
> 		E.g.  using ‘git’ or another Version Control System
> 		This is One Big File (done in sections), not many files in a directory, though I could try that next time.

Better?  Probably not.

Here's my 5 minute solution:

#!/bin/bash

INFILE=$1
WORKDIR=`mktemp -d`

split --suffix-length=8 --lines=1 -d ${INFILE} ${WORKDIR}/
fdupes -q -d -N ${WORKDIR} >/dev/null
cat ${WORKDIR}/* > $INFILE.deduped
rm -rf ${WORKDIR}

Does use a lot of files. :-)

Brett

-- 
--  /) _ _ _/_/ / / /__ _ _//
-- /_)/</= / / (_(_//_//< ///