[clug] file parsing strangeness (Boris Rousak) (linux Digest, Vol 97, Issue 10, Message 8)
Miles Goodhew
mgoodhew at gmail.com
Fri Jan 7 18:18:22 MST 2011
Boris,
> Date: Fri, 07 Jan 2011 16:49:52 +0000
> From: Boris Rousak <b.rousak at qmul.ac.uk>
> Message-ID: <4D274430.6030306 at qmul.ac.uk>
...
> Can anyone suggest anything else can try do to this file to make it grep
> friendly? I'll probably end up doing the parsing using awk, but
> curiosity is getting the better of me :)
Weirdness.
Clutching at straws here, but I wonder if it's a wide-char Unicode
file? Some tools treating it as ASCII (such-as "strings") will be
completely flummoxed by the data (They'll see it as many one-char
strings). Anything that PRINTS the text (like catting to terminal)
will "skip-over" what it sees as NUL bytes and it looks just like
"normal text".
Test: "% od -c <logfile | less" and you should see something like:
0000000 \0 T \0 h \0 i \0 s \0 \0 i \0 s \0
0000020 \0 s \0 o \0 m \0 e \0 \0 t \0 e \0 x
0000040 \0 t \0 \n
(But with your log data).
Fix: Use iconv: "% iconv -f UNICODEBIG -t ascii <infile >outfile"
Hope that helps.
M0les.
--
Miles Goodhew,
Executive Computer Scientist
More information about the linux
mailing list