[clug] Unique Id's and CD's

Alex Satrapa alexsatrapa at mac.com
Fri May 8 00:12:06 GMT 2009


On 08/05/2009, at 09:08 , Robert Edwards wrote:

> If you want an ID for _any_ data CD you will probably need to do some
> sort of checksum (md5, sha1 etc.) across the entire disk and live
> with the fact that collisions will occur with a small probability.

You could get away with generating an MD5 and SHA1 checksum of the  
first 512kB. Either of them might collide, but I doubt both would  
collide on the same disc. These days 512kB is nothing - read a few  
hundred blocks from the CD and there you have it.

It may even be possible to include what ATIP data is available (dye  
type, manufacturer, etc) which is unreliable as an information source,  
but still a possible seed if you only buy discs in batches of 10 or  
20. How much information can you get about the media itself?

You may even find that just using the table of contents for each/every  
session on the disk might be enough to ensure uniqueness for your  
problem space. I doubt that two disks having the exact same directory  
structure will end up being different. At which point you could just  
use a checksum on the output of "ls -lR" which is hardly going to be  
the entire data content of the disc, but will uniquely identify any  
file system.

The MD5+SHA1 of "ls -lR" should work to uniquely identify a version of  
a CD that is regularly updated by a vendor, for example. They might  
ship out a disk labelled "WONDERFUL-UPDATES" which contains updates to  
their "Wonderful™" product. The file structure might look exactly the  
same each time: one "SETUP.EXE" file in the root of the file system.  
You'd expect that the datestamp and file size would change between  
updates, so this would be covered by the "ls -lR" (since it contains  
all the information).

To futureproof the system, you might write your "ls -lR" with the  
specific options of which columns and number formats to display. Just  
in case a future version of ls decides to use "human readable" output  
(eg: 3.4M instead of 3400000).

Alex

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 220 bytes
Desc: This is a digitally signed message part
Url : http://lists.samba.org/archive/linux/attachments/20090508/35b775c6/PGP.bin


More information about the linux mailing list