[clug] Thursday afternoon Perl/bash golf time
Scott Ferguson
scott.ferguson.clug at gmail.com
Thu Jun 11 05:31:21 MDT 2015
On 11/06/15 20:07, Andrew Janke wrote:
>> If you find it (the BASH) fails with large numbers it may be ARG_MAX -
>> you should be able to get around that using xargs, the next issue may be
>> speed. As it's mostly I/O you could use a different drive for the
>> $Result, or a virtual file. Either way I'd consider preallocating space
>> first.
>
> The files will typically be in the 8-30MB size, but there may be
> thousands of them.
I'm guessing that's where your shell scripts are failing (thousands of
files in the list) - the for loop would use a lot of ram, but from dodgy
memory built-ins don't have the same ARG_MAX limits as ls or find *.
'might'(?) be better than a for:-
while read $i ; do `stuff` ; done < files.txt
xarg would be my choice as it'd work bit-wise and avoid the trap of
having the shell hold everything in RAM/(shudder)tmp. The size of the
actual files shouldn't be a problem though it may slow other processes -
given the process I'd:-
;do what you've done (ask others 'cause I'm a terrible scripter)
;preallocate space (belt and suspenders?) with a find xarg loop, then
use a perl snippet in a for loop with xargs to a virtual file system
(especially if I needed to parse the output). Kind of like Hal's
suggestion (xarg will chunk the arguments to multiple cat/s - I guess).
Disclaimer - mainly I'm interested in why my thoughts are dumb so I
be less incompetent at scripting - not because I think I've got any good
idea.
Are they sparse files?
> It's serial microscopy data that is being converted
> to HDF5 format for later analysis. This data is typically exported
> from scanners in OME Tiff format.
Nice job. Thanks for sharing.
>
>
> a
>
Kind regards
More information about the linux
mailing list