superlifter design notes (OpenVMS perspective)

John E. Malmberg wb8tyw at qsl.net
Sun Jul 21 21:57:01 EST 2002


 > Qualities
 >
 > 1. Be reasonably portable: at least in principle, it should be
 > possible to port to Windows, OS X, and various Unixes without major
 > changes.

In general, I would like to see OpenVMS in that list.

 > Principles
 >
 > 1. Clean design rather than micro-optimization.

A clean design allows optimization to be done by the compiler, and tight 
optimization should be driven by profiling tools.

 > 4. Keep the socket open until the client gets bored. (Avoids startup
 > time; good for on-line mirroring; good for interactive clients.)

I am afraid I do not quite understand this one.  Are you refering to a 
server waiting for a reconnect for a while instead of reconnecting?

If so, that seems to be a standard behavior for network daemons.

 > 5. Similarly, no silly tricks with forking, threads, or nonblocking
 > IO: one process, one IO.

Forking or multiple processes can be high cost on some platforms.  I am 
not experienced with Posix threads to judge their portability.

But as long as it is done right, non-blocking I/O is not a problem for me.

If you structure the protocol processing where no subroutine ever posts 
a write and then waits for a read, you can set up a library that can be 
used either blocking or non-blocking.

The same for file access.

On OpenVMS, I can do all I/O in a non-blocking manor.  The problem is 
that I must use native I/O calls to do so.

If the structure is that after any I/O, control returns to a common 
point for the next step in the protocol, then it is easy to move from a 
blocking implementation to a non-blocking one.  MACROs can probably be 
used to allow common code to be used for blocking or non-blocking 
implementations.

Two systems that use non-blocking mode can push a higher data rate 
through the same time period.

This is an area where I can offer help to produce a clean implementation.

One of the obstacles to me cleanly implementing RSYNC as a single 
process is when a subroutine is waiting for a response to a command that 
it sent.  If that subroutine is called from as an asynchronous event, it 
blocks all other execution in that process.  That same practice hurts in 
SAMBA.


 > 8. Design for testability. For example: don't rely on global
 > resources that may not be available when testing; do make behaviours
 > deterministic to ease testing.

Test programs that internally fork() are very troublesome for me. 
Starting a few hundred individually by a script are not.

I can only read UNIX shell scripts of minor complexity.

 > 10. Have a design that is as simple as possible.

 > 11. "Smart clients, dumb servers." This is claimed to be a good
 > design pattern for internet software. rsync at the moment does not
 > really adhere to it. Part of the point of rsync is that having a
 > smarter server can make things much more efficient. A strength of
 > this approach is that to add features, you (often) only need to add
 > them to the client.

It should be a case of who can do the job easier.

 > 12. Try to keep the TCP pipe full in both directions at all times.
 > Pursuing this intently has worked well in rsync, but has also led to
 > a complicated design prone to deadlocks.

Deadlocks can be avoided.  Make sure if an I/O is initiated, that the 
next step is to return to the protocol dispatching routine.

 > General design ideas
 >
 > 9  Model files as composed of a stream of bytes, plus an optional
 > table of key-value attributes. Some of these can be distinguished to
 > model ownership, ACLs, resource forks, etc.

Not portable.  This will effectively either exclude all non-UNIX or make 
it very difficult to port to them.

Binary files are a stream of bytes.

Text files are a stream of records.  Many systems do not store text 
files as a stream of bytes.  They may or may not even be ASCII.

If you are going to maintain meta files for ACLs and Resource Forks.
Then there should be some provision to supply attributes for an entire 
directory or individual files.

BINARY files are no real problem.  The binary is either meaningful on 
the client or server or it is not.  However file attributes may need to 
be maintained.  If the file attributes are maintained, it would be 
possible for me to have a OpenVMS indexed file moved up to a UNIX 
server, and then back to another OpenVMS system and be usuable.

Currently in order to do so, I must encapsulate them in a .ZIP archive.
That is .ZIP, not GZIP or BZIP.  On OpenVMS those are only useful to 
transfer source and a limited subset of binaries.

TEXT files are much different than binary files, except on UNIX.

A text file needs to be processed by records, and on many systems can 
not have the records updated randomly, or if they do it is not real 
efficient.

If a target use for this program is to be for assisting in cross 
platform open source synchronization, then it really needs to properly 
address the text files.

A server should know how to represent a TEXT file in a portable format 
to the client.  Stream records in ASCII, delimited by Line-Feeds is 
probably the most convenient.

The client would be responsible to make sure that a TEXT file is in a 
local format.



Additional note #1:

I recall seeing a comment somewhere in this thread about timestamps 
being left to 16 bits.

File timestamps for OpenVMS and for Windows NT are in 64 bits, but use 
different base dates.

Using 16 bits for timestamps will result in a loss of data for these 
platforms.  For applications like open source distribution, the data 
loss is probably not significant.  For BACKUP type applications, it can 
be significant.


Additional note #2:

File attributes need to be stored somewhere, so a reserved directory or 
filename convention will need to be used.

I assume that there will be provisions for a server to be marked as a 
master reference.


Additional note #3:

For flexability, a client may need to provide filename translation, so 
the original filename (that will be used on the wire) should be stored 
as a file attribute.  It also follows that it probably is a good idea to 
store the translated filename as an attribute also.



-John
wb8tyw at qsl.network
Personal Opinion Only






More information about the rsync mailing list