superlifter design notes (was Re: ...

John E. Malmberg wb8tyw at qsl.net
Sat Jul 27 20:37:01 EST 2002


Martin Pool wrote:
> On 27 Jul 2002, "John E. Malmberg" <wb8tyw at qsl.net> wrote:
> 
>>A program serving source files for distribution does not need to be that 
>>concerned with preserving exact file attributes, but may need to track 
>>suggested file attributes for for the various client platforms.
>>
>>A program that is replicating for backup  purposes must not have any 
>>loss of data, including any operating specific file attributes.
>>
>>That is why I posted previously that they should be designed as two 
>>separate but related programs.
> 
> I'm not sure that the application space for rsync really divides
> neatly into two parts like that.  Can you expand a bit more on how
> you think they would be used?

Well remember, I am on the outside looking in, and of course I could be 
missing things. :-)

I did post this previously, but the message apparently got buried the 
large number of messages posted that day.


The two uses for rsync that I am seeing discussed on this list are:

Backup:  A low overhead and possibly distance backup of disks or directory.

In the case of a backup, usually it is the same platform, or one that is 
very close to being the same.  Also it is important that security 
information, and file attributes all be properly maintained.

The mapping of security information is platform specific, so this is a 
going to be an ongoing problem.  It is also critical that timestamps be 
maintained.

Since this is usually the same or closely similar platforms, a VFS layer 
can be used to store and retrieve attributes.  No special attribute 
files or host based translations should be needed.

The downsides are that as far as I can see there are no portable 
standard APIs to retrieve the security information, and as more variants 
are discovered, it may be hard to work them in for backward compatability.

Because you are distributing an arbitrary set of directories, it is 
ususally not permitted to add files to assist in the transfer.


This also seems to be an addition to rsync's original mission.

Also using something like rsync for backup of binary files has the 
potential for undetected corruption.  While the checksumming algorithm 
is good, it is not guaranteed to be perfect.  And no, I do not want to 
recycle the old arguments about this.

With a text file, the set of possible values is restricted enough that 
it is unlikely that the checksum method would fail, and if it did, the 
resulting corruption is more easily detected.


File Distribution:  A low overhead method of keeping local source 
directory trees synchronized with remote distributions.

In this case, strict binary preservation of time stamps is not needed 
and maintaining security attributes is usually not desired.  So that is 
two problems eliminated.

What rsync does not do now, is differentiate between text files and 
binary files.  A client that uses a different internal format for text 
files than binary files needs to do extra work.

And unless the server tells it what type of file is coming, it must 
guess based on the filename.

But you are specifically distributing a special tree of files in this 
case, not an arbitrary directory.  That gives you the ability to add 
special attribute files to assist in the transfer.


So while the two uses have a lot in common, there are significant 
differences, and having one program attempt to do both can lead to 
greater complexity.

-John
wb8tyw at qsl.network
Personal Opinion Only






More information about the rsync mailing list