[clug] Splitting a file using bash

Hal Ashburner hal at ashburner.info
Sun Sep 14 18:54:04 MDT 2014


I want to turn stdout into  multiple files. I have markers where I
would like this to happen.

stdout:
data
data
data
this_is_a_marker
data2
data2
data2
data2


only it's very large

I have this function which works, but is slow.
Better ideas would include:
1) re-write everything in another language eg python
2) re-write split reports in C
3) ask CLUG if anyone has a faster way of doing this using standard
bash 4.1.2 or older on a redhat enterprise/centos system.

Yeah I just asked about optimising a shell script, I already feel bad
and you don't have to point out that I should. ;-)

function split_reports()
{
    local input_file="$1"
    local first_report="$2"
    local second_report="$3"
    # generalise the above using $@ if more than 2 needed
    local breaks_seen=0
    local line=""
    while read line
    do
        if [[ $line =~ start_report ]]; then
            breaks_seen=$((breaks_seen + 1))
            # clobber it before using it
            # don't write out the marker
            : > ${input_file}.${breaks_seen}
        else
            case $breaks_seen in
                [0-9]) echo "${line}" >> ${input_file}.${breaks_seen} ;;
                *) echo_stderr "error breaks_seen is ${breaks_seen} -
should be 0-1";;
            esac
        fi
    done < "${input_file}"

    mv "${input_file}" "${input_file}.orig"
    mv "${input_file}.0" "${first_report}"
    mv "${input_file}.1" "${second_report}"
}


More information about the linux mailing list