[clug] Compressing similar text files

steve jenkin sjenkin at canb.auug.org.au
Sun Sep 9 04:37:43 MDT 2012

David Austin wrote on 9/09/12 8:19 PM:
> Would tar + (b|g)zip work for your application?
> David


Excellent question. One I should've looked at.

For other reasons I'd wanted to leave the individual files in the
directory and forgot to test this case.

It does seem that gzip and bzip used in tar do learn common strings [see

But nowhere near the ~2Mb I was hoping to get to.


Telstra_RIM_exchanges steve$ tar czf t.tgz *.html;ls -lh t.tgz
-rw-r--r--  1 steve  steve   6.8M  9 Sep 20:26 t.tgz

Telstra_RIM_exchanges steve$ tar cjf t.tbz *.html;ls -lh t.tbz
-rw-r--r--  1 steve  steve   5.0M  9 Sep 20:28 t.tbz

Compared to compressing the whole stream:
Telstra_RIM_exchanges steve$ cat *.html|gzip -c|wc -c

Telstra_RIM_exchanges steve$ cat *.html|bzip2 -c|wc -c

