[clug] Unique Id's and CD's
alexsatrapa at mac.com
Fri May 8 00:12:06 GMT 2009
On 08/05/2009, at 09:08 , Robert Edwards wrote:
> If you want an ID for _any_ data CD you will probably need to do some
> sort of checksum (md5, sha1 etc.) across the entire disk and live
> with the fact that collisions will occur with a small probability.
You could get away with generating an MD5 and SHA1 checksum of the
first 512kB. Either of them might collide, but I doubt both would
collide on the same disc. These days 512kB is nothing - read a few
hundred blocks from the CD and there you have it.
It may even be possible to include what ATIP data is available (dye
type, manufacturer, etc) which is unreliable as an information source,
but still a possible seed if you only buy discs in batches of 10 or
20. How much information can you get about the media itself?
You may even find that just using the table of contents for each/every
session on the disk might be enough to ensure uniqueness for your
problem space. I doubt that two disks having the exact same directory
structure will end up being different. At which point you could just
use a checksum on the output of "ls -lR" which is hardly going to be
the entire data content of the disc, but will uniquely identify any
The MD5+SHA1 of "ls -lR" should work to uniquely identify a version of
a CD that is regularly updated by a vendor, for example. They might
ship out a disk labelled "WONDERFUL-UPDATES" which contains updates to
their "Wonderful™" product. The file structure might look exactly the
same each time: one "SETUP.EXE" file in the root of the file system.
You'd expect that the datestamp and file size would change between
updates, so this would be covered by the "ls -lR" (since it contains
all the information).
To futureproof the system, you might write your "ls -lR" with the
specific options of which columns and number formats to display. Just
in case a future version of ls decides to use "human readable" output
(eg: 3.4M instead of 3400000).
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 220 bytes
Desc: This is a digitally signed message part
Url : http://lists.samba.org/archive/linux/attachments/20090508/35b775c6/PGP.bin
More information about the linux