[clug] using 'dd' in a pipeline

Mon Nov 24 02:03:57 MST 2014

On Mon, 24 Nov 2014 13:56:45 +1100
steve jenkin <sjenkin at canb.auug.org.au> wrote:

> I’ve been looking at deliberate MD5 collisions and went to write a simple script based on
> ‘dd’, only to fall flat on my face - detecting ‘end of file’ from STDIN.
> 
> ‘dd’ doesn’t consider EOF as an error and returns “success”.
> 
> I could use dd’s “iseek”, but that ties me to named files in the file system :(
> I’d prefer to use a pipeline - current script below, but it requires I first know the size of
> the file, exactly the same issue as ‘iseek’.
> 
> One option is to catch the output of ‘dd’ in a temporary file, then test for its size, ending
> when the temp file is zero length.
> 
> I tried using the bash ‘read’ with "-n 0”, but that reads an entire line [not good for a
> binary file] ‘IFS=“” read -r -n 1’ runs, but consumes a byte of the input and discards binary
> ‘\0’ bytes :(
> 
> The stat() and read() system calls either don’t report EOF, or return zero-length at EOF.
> 
> I could test if a program that does a read() of zero-length, then uses ‘select’ with a short
> timeout, would reliably detect EOF on STDIN if it was a pipe or file. It doesn’t seem
> reliable & portable to me, but it may be.
> 
> ioctl() and fcntl() didn’t seem to have
> 
> Anyone got ideas that I can use in a shell script? I’d prefer not to have to write code :)
> There may already be extended versions of ‘dd’ that do this (haven’t checked these: dc3dd
> dcfldd dd_rescue ddd)
> 
> Or suggestion on existing tools or how to implement a simple test for EOF on STDIN (pipe or
> file) in PERL, PHP or other scripting languages.
> 
> Thanks in Advance
> steve
> 
> =====================
> Two files with same MD5 hash (e06723d4961a0a3f950e7786f3766338)
> <http://natmchugh.blogspot.co.at/2014/10/how-i-created-two-images-with-same-md5.html>
> 
> Files:
> brown.jpg <http://www.fishtrap.co.uk/james.jpg.coll>
> white.jpg <http://www.fishtrap.co.uk/barry.jpg.coll>
> =====================
> 
> #!/bin/bash
> 
> bs=4096
> count=1
> 
>  for file in white.jpg brown.jpg
> do
>   echo "$file blocksize=${bs}"
>   size=$(ls -l $file | awk -v bs="$bs" '{print $5 / bs}')
> 
>   cat $file|\
>   while [[ $size -gt 0 ]]
>   do
>     echo -n "$size "
>     dd bs=$bs count=$blocks 2>/dev/null |md5sum -
>     size=$(( $size - $blocks ))
>   done
>   echo
> done
>

Interesting thread. Have you considered od
=============================================================
#!/bin/sh

DIFF=/usr/bin/diff
OD=/usr/bin/od

BARRY=/home/owen/Downloads/barry.jpg.bin
JAMES=/home/owen/Downloads/james.jpg.bin

$OD -j100 -N50 $BARRY>barry.txt
$OD -j100 -N50 $JAMES>james.txt
$DIFF barry.txt james.txt
=============================================================

That just takes a sample of the files and compares them
-- 

Owen