[clug] Compressing similar text files

steve jenkin sjenkin at canb.auug.org.au
Sun Sep 9 04:37:43 MDT 2012


David Austin wrote on 9/09/12 8:19 PM:
> Would tar + (b|g)zip work for your application?
> 
> David


David,

Excellent question. One I should've looked at.

For other reasons I'd wanted to leave the individual files in the
directory and forgot to test this case.

It does seem that gzip and bzip used in tar do learn common strings [see
below]

But nowhere near the ~2Mb I was hoping to get to.

cheers
steve

Telstra_RIM_exchanges steve$ tar czf t.tgz *.html;ls -lh t.tgz
-rw-r--r--  1 steve  steve   6.8M  9 Sep 20:26 t.tgz

Telstra_RIM_exchanges steve$ tar cjf t.tbz *.html;ls -lh t.tbz
-rw-r--r--  1 steve  steve   5.0M  9 Sep 20:28 t.tbz


Compared to compressing the whole stream:
Telstra_RIM_exchanges steve$ cat *.html|gzip -c|wc -c
 6,729,406
[6.4Mb]

Telstra_RIM_exchanges steve$ cat *.html|bzip2 -c|wc -c
 5,006,262
[5Mb]


-- 
Steve Jenkin, Info Tech, Systems and Design Specialist.
0412 786 915 (+61 412 786 915)
PO Box 48, Kippax ACT 2615, AUSTRALIA

sjenkin at canb.auug.org.au http://members.tip.net.au/~sjenkin


More information about the linux mailing list