OK, I'm brand new to this group, brand new to rsync, brand new to unix
in general. I'm trying to play catch up with this discussion so there
are likely many misconceptions that I have about these issues.
My goal is to create a tool that does backup and restore only
transferring changes. It will connect to a server running Linux from
Mac OS X and preserve all metadata without the user ever knowing there
is an issue. I've found the rsync algorithm is a good start and it
sounds like you all have the same idea.
I don't think I like the idea of the MacBinary solution, in that I can
see some configuration of the tool that the user will have to worry
about. We obviously don't want the overhead of flattening files
without forks or files that have FileInfo that can be determined from
other metadata strategies. The user might have to maintain a list of
files they use... How do I handle this file or that (á la mac cvs
tools).
I see another user experience issue with the MacBinary solution and
the protocol change. What do the files look like when they get backed
up? If I connect to the server via the finder am I going to see a
bunch of files that are 'archived' or do I get the real deal. As a
user I wouldn't use rsync if I couldn't just go and grab the files
that got backed up. Not that running the file through stuffit is a
big deal but it going to seems a bit clunky even if the solution is in
fact much more 0000,0000,0000extensible.
What format is this new protocol going to produce? Will the only way
to get to the files be to use the rsync client? Sorry, that's just
not acceptable.
The only solution left is to pre-process the file by splitting it
before before creating the change lists so that comparisons can be
made if the file is split on the server. There will have to be some
intelligence about what method of splitting is used on the server but
I'm positive that couldn't be too hard to determine. Directory
metadata just has to be handled in another file as well, isn't that
what .DSInfo files are? I'm starting to think that what I'm proposing
is more of a combination of 2) and 3). Wouldn't it be great if we
could support ACL's as well. Please tell me if I'm way off base here.
One other question that I'm sure will show my ignorance of Darwin
development. What is the issue with using the high level API's if the
output is compatible with the other platforms running rsync. What is
the advantage of trying for posix purity or code at the "Darwin level"
if the code is only going to be used on Macs running the higher level
stuff anyway? If you don't have a forked file system why would you
care if you don't know how to handle forks?
I'm planning on taking this project on full time and we would all
benefit if we can all agree on a direction.
Lets get this thing going,
Terrence Geernaert
Mark Valence wrote:
1) convert (on the fly) all files to MacBinary before
comparing/sending them to the destination. MacBinary is a well
documented way to package an HFS file into a single data file. The
benefits with this method are compatibility with existing rsync
versions that are not MacBinary aware, while the drawbacks are speed,
maintainability, and that directory metadata is not addressed at all.
2) Treat the two forks and metadata as three separate files for the
purposes of comparison/sending, and then reassemble them on the
destination. Same drawbacks and benefits of the MacBinary route. This
would also take more memory (potentially three times the number of
files in the flist).
3) Change the protocol and implementation to handle arbitrary metadata
and multiple forks. This could be made sort-of compatible with
existing rsync's by using various tricks, but the most efficient way
would be to alter the protocol. Benefits are that this would make the
protocol extensible. Metadata can be "tagged" so that you could add
any values needed, and ignore those tags that are not understood or
supported. Any number of forks could be supported, which gives a step
up in supporting NTFS where a file can have any number of "data
streams". In fact, forks and metadata could all be done in the same
way in the protocol.