[clug] Thursday afternoon Perl/bash golf time

Scott Ferguson scott.ferguson.clug at gmail.com
Thu Jun 11 02:53:53 MDT 2015


On 11/06/15 16:26, Andrew Janke wrote:
> I have a perl script, one of its jobs is to concatenate a large number
> of (binary) files together.
> 
> I cheat and use the shell
> 
> # create rawfile
> @args = ('cat', @catfiles, '>', "$opt{'tmpdir'}/all.raw");
> &do_cmd(join(' ', @args));
> 
> (from: http://git.io/vIM2p)
> 
> Of course now someone tried to use it with a large number of input
> files and it failed. So what to do?
> 
> 1) for($i=0; $i<$#catfiles; $i+=100){
>          system("cat @catfiles[$i..$i+99] /tmp/cat-$i");
>          }
> 
>      # cat the bits.
> 
> 
> 2) Do it in bash
> 
>      for i in infiles.txt; do cat $i >> result; done

And this fails??

Are there spaces in any of those filenames?
Are they just filenames or paths as well? If they're paths why not use
find, (then you could eliminate cat? - which shouldn't make any real
difference as appending is what it does best).

> 
> 
> 3) Do it in perl
> 
> 
> 4) Something far more clever.
> 
> Suggestions welcome, my brain is slowing.
> 
> ta
> 
> 
> 
> a
> 


If you find it (the BASH) fails with large numbers it may be ARG_MAX -
you should be able to get around that using xargs, the next issue may be
speed. As it's mostly I/O you could use a different drive for the
$Result, or a virtual file. Either way I'd consider preallocating space
first.

I'm curious.

Kind regards


More information about the linux mailing list