[clug] Compressing similar text files

steve jenkin sjenkin at canb.auug.org.au
Sun Sep 9 06:27:43 MDT 2012


Alastair D'Silva wrote on 9/09/12 8:38 PM:

> Before trying to mangle the data, take a look at lrzip, which sorts first
> (similar to rzip), then LZMAs the result.
> http://ck.kolivas.org/apps/lrzip/

Thanks to both Alistair and Brad for lrzip.

lrzip *.html (ie. files independently) -> 42Mb.
Better than gzip and bzip2 on independent files, but not great.

but for this application, lztar really shone: 1.8Mb.
[many files, duplicates across files]

==> Thanks very much. A great solution, and No Coding Needed!

======================================

Telstra_RIM_exchanges steve$ du -hs tmp
176M	tmp

Telstra_RIM_exchanges steve$ lrztar tmp/

tmp.tar - Compression Ratio: 90.568. Average Compression Speed:  2.705MB/s.
Total time: 00:01:01.636

Telstra_RIM_exchanges steve$ ls -lh  tmp.tar.lrz
-rw-r--r--  1 steve  steve   1.8M  9 Sep 22:16 tmp.tar.lrz

-- 
Steve Jenkin, Info Tech, Systems and Design Specialist.
0412 786 915 (+61 412 786 915)
PO Box 48, Kippax ACT 2615, AUSTRALIA

sjenkin at canb.auug.org.au http://members.tip.net.au/~sjenkin


More information about the linux mailing list