[clug] Substring repetition detection

Michael Still mikal at stillhq.com
Wed Jun 25 16:31:23 EST 2003


On Wed, 25 Jun 2003, James McNeill wrote:

> Is there a maximum length of the pattern?

Well, the current large string I am working on is 1970 characters, but
they could be a lot bigger.

> Kim's right, all gzip does is find patterns in data. It'll find allot more
> than just the linear repartitions that you want though.

Time for more information. Each "letter" in the string is an MD5 hash of a
USB packet sent between Windows and my webcam. I want to be able to
display the protocol dump, but have repetitious packet sequences
squelched.

It would be cool for tcpdump too -- hey, there a lot of NFS, and then a
ping / response, and then the same ping 200 times. It would make dumps a
lot more readable.

I must admit, digging through the gzip code doesn't sound like much fun...

Mikal

-- 

Michael Still (mikal at stillhq.com) | Stage 1: Steal underpants
http://www.stillhq.com            | Stage 2: ????
UTC + 10                          | Stage 3: Profit




More information about the linux mailing list