[clug] Splitting a file using bash

Andrew Janke a.janke at gmail.com
Sun Sep 14 18:57:41 MDT 2014


Isn't this what csplit is made for?


a

On 15 September 2014 10:54, Hal Ashburner <hal at ashburner.info> wrote:
> I want to turn stdout into  multiple files. I have markers where I
> would like this to happen.
>
> stdout:
> data
> data
> data
> this_is_a_marker
> data2
> data2
> data2
> data2
>
>
> only it's very large
>
> I have this function which works, but is slow.
> Better ideas would include:
> 1) re-write everything in another language eg python
> 2) re-write split reports in C
> 3) ask CLUG if anyone has a faster way of doing this using standard
> bash 4.1.2 or older on a redhat enterprise/centos system.
>
> Yeah I just asked about optimising a shell script, I already feel bad
> and you don't have to point out that I should. ;-)
>
> function split_reports()
> {
>     local input_file="$1"
>     local first_report="$2"
>     local second_report="$3"
>     # generalise the above using $@ if more than 2 needed
>     local breaks_seen=0
>     local line=""
>     while read line
>     do
>         if [[ $line =~ start_report ]]; then
>             breaks_seen=$((breaks_seen + 1))
>             # clobber it before using it
>             # don't write out the marker
>             : > ${input_file}.${breaks_seen}
>         else
>             case $breaks_seen in
>                 [0-9]) echo "${line}" >> ${input_file}.${breaks_seen} ;;
>                 *) echo_stderr "error breaks_seen is ${breaks_seen} -
> should be 0-1";;
>             esac
>         fi
>     done < "${input_file}"
>
>     mv "${input_file}" "${input_file}.orig"
>     mv "${input_file}.0" "${first_report}"
>     mv "${input_file}.1" "${second_report}"
> }
> --
> linux mailing list
> linux at lists.samba.org
> https://lists.samba.org/mailman/listinfo/linux


More information about the linux mailing list