some clarity Re: HFS+ resource forks: WIP patch included

D Andrew Reynhout reynhout at quesera.com
Fri Mar 12 16:17:43 GMT 2004


On Fri, Mar 12, 2004 at 09:41:03AM -0500, Wesley D Craig wrote:
> I'd be happy to work on this myself, since I already have very similar 
> code, if there was some possibility that the rsync maintainers would be 
> willing to accept the modifications.  Otherwise, it seems like a waste 
> of effort.


I think the maintainers have to err on the side of caution
regarding changes to the application, and especially to the
protocol.

If we can find a way to handle metadata in general, and then
implement the specific needs for HFS+, (instead of the other
way around, which has been my approach so far), then I think
they would see more value in making changes.

The problem, as I see it (usual disclaimers apply), is that
rsync depends in many places on a very strict sourcefile-to-
destinationfile mapping.  To make the sourcefile "virtual",
i.e. created on the fly, would require deep restructuring
(and extra memory).  The easier way out is to create a new
sourcefile in the FS from the metadata and add it to the
flist, but that requires local disk space, and unless the
protocol is updated, requires you to trick the sort routine.

The most straightforward and plausible idea I can think of
is to update the protocol to include explicit file-IDs
(instead of implicit offsets in the sorted flist), and add
a "capabilities" header to the protocol version.

e.g.:
    rsync protocol 28
		(means file-IDs (can be) explicit, and should 
        exchange local capabilities)
    capabilities: (subset of) read_ufs store_ufs
          read_hfs+metadata store_hfs+metadata
          read_ntfs-streams store_ntfs-streams

The sender and receiver can figure out if they'll be able
to accomplish what they need to by exchanging this info.
(not sure if rsync "dumbs-down" for lower protocol revs.)

If the receiver's protocol is less than 28, then the sender
has to decide if it can send all the data it wants to, and
whether to send what it can (or bail) if not.

If the receiver's protocol is >=28, the sender (can) check
the receiver capabilities to see if it should just stream
the whole HFS+ file (a la rsync_hfs), or convert to a 
common-capability format first (AppleSingle, MacBinary),
or elect to split the file out into multiple streams
(AppleDouble, Apple's newer ._<filename> scheme, etc).

Just getting explicit file-IDs into the protocol would
greatly increase flexibility, if the capabilities idea
is too fraught.  

Rsync was written with UFS and similar file systems deeply
in mind, and then optimized (ad absurdium -- if two files
have the same byte-count, the protocol omits sending that
piece of duplicate information for the second file) in the
interest of keeping network traffic to the ABSOLUTE minimum.

But rsync has become the best tool I know of to move files
between any two machines.  I think metadata awareness and
capabilities would be a great addition.  Filesystem designers
should make the metadata more accessible* but all modern
filesystems have metadata, and I think it will only get more
important as time goes on.

* maybe the POSIX open() should open as a full-metadata
stream so that /bin/cp and such will just *work*.  Another
open function could be used by the OS when it only wants
part of the file (data fork) etc...  Or flags on open(),
but that will break POSIX compliance...the OS could fake
it out, but it might not be worth the developer confusion.
Anyway, so far filesystem/OS designers aren't doing this,
so here we are.

Andrew
reynhout at quesera.com




More information about the rsync mailing list