[clug] Thursday afternoon Perl/bash golf time

Thu Jun 11 05:31:21 MDT 2015

On 11/06/15 20:07, Andrew Janke wrote:
>> If you find it (the BASH) fails with large numbers it may be ARG_MAX -
>> you should be able to get around that using xargs, the next issue may be
>> speed. As it's mostly I/O you could use a different drive for the
>> $Result, or a virtual file. Either way I'd consider preallocating space
>> first.
> 
> The files will typically be in the 8-30MB size, but there may be
> thousands of them. 

I'm guessing that's where your shell scripts are failing (thousands of
files in the list) - the for loop would use a lot of ram, but from dodgy
memory built-ins don't have the same ARG_MAX limits as ls or find *.

'might'(?) be better than a for:-
while read $i ; do `stuff` ; done < files.txt

xarg would be my choice as it'd work bit-wise and avoid the trap of
having the shell hold everything in RAM/(shudder)tmp. The size of the
actual files shouldn't be a problem though it may slow other processes -
given the process I'd:-
;do what you've done (ask others 'cause I'm a terrible scripter)
;preallocate space (belt and suspenders?) with a find xarg loop, then
use a perl snippet in a for loop with xargs to a virtual file system
(especially if I needed to parse the output). Kind of like Hal's
suggestion (xarg will chunk the arguments to multiple cat/s - I guess).

Disclaimer - mainly I'm interested in why my thoughts are dumb so I
be less incompetent at scripting - not because I think I've got any good
idea.

Are they sparse files?

> It's serial microscopy data that is being converted
> to HDF5 format for later analysis. This data is typically exported
> from scanners in OME Tiff format.

Nice job. Thanks for sharing.
> 
> 
> a
> 

Kind regards