[Bug 8654] link-by-hash: Fix (non-exploitable) buffer overflow

samba-bugs at samba.org samba-bugs at samba.org
Fri Dec 16 20:06:26 MST 2011


--- Comment #11 from Chris Dunlop <chris at onthe.net.au> 2011-12-17 03:06:25 UTC ---
Created attachment 7202
  --> https://bugzilla.samba.org/attachment.cgi?id=7202
link-by-hash: checksum hash required when copying file

(In reply to comment #10)
> Thanks for pointing out the various problems with the patch, and providing some
> suggested fixes.
> I made several changes:

Looks good!

Your additional --debug=hashlink option was almost word for word identical to a
patch I had sitting here. (I'm learning to not sent off patches as soon as I
create them!)

BTW, referencing your rsync-patches commit message, the surname is "Dunlop"
(like the tyres) rather than "Dunlap" :-)

See additional patch here: link-by-hash requires the checksum hash when it's
copying a file so that it can properly link into the link farm. In the absence
of this patch I'm seeing it trying to link to the all-zeros hash name (perhaps
if a previous file had carried a checksum the copy file would link to that
previous checksum?).

Given existing link farms are already affected by the change to the MD5 hash,
it's worth considering another change...

There's a general issue with the link farm directory structure, currently using
an 8-char top level directory and 24-char second level directory, before
getting to the actual link files.

As the whole point of md5 is to evenly distribute the hash, this means you'll
essentially end up with a bucket-load of top level directories (up to 16^8),
each containing one second level directory, each containing one link file. 
That's pretty expensive in terms of directory inodes and blocks and associated
I/O etc.

Perhaps a better structure would be a two-char top level directory (giving 256
in all) and 30-char second level directory, so the top level directories each
contain a reasonable number of second level directories.

That would still leave all your second level directories with one or very few
link files, so maybe even better would be to replace the second level
directories directly with the link files, with a file extension to indicate
different files mapping to the same hash. E.g. instead of
hash[0-7]/hash[8-32]/0, have hash[0-1]/hash[2-32]_00. Even though the top level
directories could end up with large numbers of files, if you have that many
files in the first place you're likely using a non-relic file system with
b-tree indexed directories.

Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

More information about the rsync mailing list