[clug] file parsing strangeness (Boris Rousak) (linux Digest, Vol 97, Issue 10, Message 8)

Miles Goodhew mgoodhew at gmail.com
Fri Jan 7 18:18:22 MST 2011


> Date: Fri, 07 Jan 2011 16:49:52 +0000
> From: Boris Rousak <b.rousak at qmul.ac.uk>
> Message-ID: <4D274430.6030306 at qmul.ac.uk>
> Can anyone suggest anything else can try do to this file to make it grep
> friendly? I'll probably end up doing the parsing using awk, but
> curiosity is getting the better of me :)

  Clutching at straws here, but I wonder if it's a wide-char Unicode
file? Some tools treating it as ASCII (such-as "strings") will be
completely flummoxed by the data (They'll see it as many one-char
strings). Anything that PRINTS the text (like catting to terminal)
will "skip-over" what it sees as NUL bytes and it looks just like
"normal text".
  Test: "% od -c <logfile | less" and you should see something like:

0000000   \0   T  \0   h  \0   i  \0   s  \0      \0   i  \0   s  \0
0000020   \0   s  \0   o  \0   m  \0   e  \0      \0   t  \0   e  \0   x
0000040   \0   t  \0  \n

(But with your log data).

Fix: Use iconv: "% iconv -f UNICODEBIG -t ascii <infile >outfile"

Hope that helps.


Miles Goodhew,
Executive Computer Scientist

More information about the linux mailing list