[clug] Help explaining a sed command to delete last 10 lines from a file

Tue Jan 14 18:08:17 MST 2014

Hi Steve,

> I've ~250,000 email messages I want to trim off some XML added to the
> end of each message, then compute the hash sum of, so I can identify
> duplicates... "Just Works" is good. Not horrendously slow is better.

If it's a fixed number of lines (let's say 3), just pipe through

  head -n-3

- this will output all but the last three lines of the file.

Alternatively, if you have a pattern (let's say '<tag>') that you can
match to detect the start of the XML, just pipe through:

  awk '/<tag>/ {exit} {print}'

 - this awk script will try and match the pattern (of '<tag>') on each
line of the file. If the match succeeds, the pattern's action is
executed, in this case to exit the awk script (and so you get no more of
the file output). Otherwise, we continue with the default action to
print the line.

Hope this helps,

Jeremy