[clug] finding duplicate sections in a text file
Steve Jenkin
sjenkin at canb.auug.org.au
Wed Jan 22 09:40:00 UTC 2020
Yes, Good observation.
There’s a table compression tool, pzip, part of the (AST) AT&T Software Technology toolkit, that I’ve never had reason to try, but always looked interesting. They claim 2-5 times better than gzip on ’tabular’ data.
It’s not general purpose, you’ve got to configure it and the best doco I could find is the “Usage” in the source file.
<https://github.com/att/ast/blob/2016-01-10-beta/src/cmd/pzip/pzip.c>
Improving Table Compression with Combinatorial Optimization
<https://www.researchgate.net/publication/2872779_Improving_Table_Compression_with_Combinatorial_Optimization>
> On 22 Jan 2020, at 17:05, Kim Holburn <kim.holburn at gmail.com> wrote:
>
> I'm not providing a solution here, but this reminds me very much of how some compression algorithms work.
>
--
Steve Jenkin, IT Systems and Design
0412 786 915 (+61 412 786 915)
PO Box 38, Kippax ACT 2615, AUSTRALIA
mailto:sjenkin at canb.auug.org.au http://members.tip.net.au/~sjenkin
More information about the linux
mailing list