[clug] finding duplicate sections in a text file

Steve Jenkin sjenkin at canb.auug.org.au
Wed Jan 22 09:40:00 UTC 2020


Yes, Good observation.

There’s a table compression tool, pzip, part of the (AST) AT&T Software Technology toolkit, that I’ve never had reason to try, but always looked interesting. They claim 2-5 times better than gzip on ’tabular’ data.

It’s not general purpose, you’ve got to configure it and the best doco I could find is the “Usage” in the source file.

<https://github.com/att/ast/blob/2016-01-10-beta/src/cmd/pzip/pzip.c>

Improving Table Compression with Combinatorial Optimization
<https://www.researchgate.net/publication/2872779_Improving_Table_Compression_with_Combinatorial_Optimization>

> On 22 Jan 2020, at 17:05, Kim Holburn <kim.holburn at gmail.com> wrote:
> 
> I'm not providing a solution here, but this reminds me very much of how some compression algorithms work.  
> 

--
Steve Jenkin, IT Systems and Design 
0412 786 915 (+61 412 786 915)
PO Box 38, Kippax ACT 2615, AUSTRALIA

mailto:sjenkin at canb.auug.org.au http://members.tip.net.au/~sjenkin




More information about the linux mailing list