[clug] Compressing similar text files
sjenkin at canb.auug.org.au
Sun Sep 9 06:27:43 MDT 2012
Alastair D'Silva wrote on 9/09/12 8:38 PM:
> Before trying to mangle the data, take a look at lrzip, which sorts first
> (similar to rzip), then LZMAs the result.
Thanks to both Alistair and Brad for lrzip.
lrzip *.html (ie. files independently) -> 42Mb.
Better than gzip and bzip2 on independent files, but not great.
but for this application, lztar really shone: 1.8Mb.
[many files, duplicates across files]
==> Thanks very much. A great solution, and No Coding Needed!
Telstra_RIM_exchanges steve$ du -hs tmp
Telstra_RIM_exchanges steve$ lrztar tmp/
tmp.tar - Compression Ratio: 90.568. Average Compression Speed: 2.705MB/s.
Total time: 00:01:01.636
Telstra_RIM_exchanges steve$ ls -lh tmp.tar.lrz
-rw-r--r-- 1 steve steve 1.8M 9 Sep 22:16 tmp.tar.lrz
Steve Jenkin, Info Tech, Systems and Design Specialist.
0412 786 915 (+61 412 786 915)
PO Box 48, Kippax ACT 2615, AUSTRALIA
sjenkin at canb.auug.org.au http://members.tip.net.au/~sjenkin
More information about the linux