[clug] Compressing similar text files
bradh at frogmouth.net
Sun Sep 9 04:20:05 MDT 2012
On Sunday 09 September 2012 20:03:11 steve jenkin wrote:
> For a project, I've downloaded ~5,000 files (3.25M lines) taking around
> They compress with gzip to 58Mb, around 4 times.
> bzip2 is very slightly better with default parameters.
[Stuff relating to the real question, which I'm not addressing at all, removed]
So the limiting factor here is probably the window size that bzip2 "looks
over" to find redundancy.
Something with a larger window (ideally larger than the input data) will
presumably find more commonality.
Can you try it with rzip (http://rzip.samba.org/) and lrzip (see
More information about the linux