superlifter design notes (OpenVMS perspective)
John E. Malmberg
wb8tyw at qsl.net
Sun Jul 21 21:57:01 EST 2002
> Qualities
>
> 1. Be reasonably portable: at least in principle, it should be
> possible to port to Windows, OS X, and various Unixes without major
> changes.
In general, I would like to see OpenVMS in that list.
> Principles
>
> 1. Clean design rather than micro-optimization.
A clean design allows optimization to be done by the compiler, and tight
optimization should be driven by profiling tools.
> 4. Keep the socket open until the client gets bored. (Avoids startup
> time; good for on-line mirroring; good for interactive clients.)
I am afraid I do not quite understand this one. Are you refering to a
server waiting for a reconnect for a while instead of reconnecting?
If so, that seems to be a standard behavior for network daemons.
> 5. Similarly, no silly tricks with forking, threads, or nonblocking
> IO: one process, one IO.
Forking or multiple processes can be high cost on some platforms. I am
not experienced with Posix threads to judge their portability.
But as long as it is done right, non-blocking I/O is not a problem for me.
If you structure the protocol processing where no subroutine ever posts
a write and then waits for a read, you can set up a library that can be
used either blocking or non-blocking.
The same for file access.
On OpenVMS, I can do all I/O in a non-blocking manor. The problem is
that I must use native I/O calls to do so.
If the structure is that after any I/O, control returns to a common
point for the next step in the protocol, then it is easy to move from a
blocking implementation to a non-blocking one. MACROs can probably be
used to allow common code to be used for blocking or non-blocking
implementations.
Two systems that use non-blocking mode can push a higher data rate
through the same time period.
This is an area where I can offer help to produce a clean implementation.
One of the obstacles to me cleanly implementing RSYNC as a single
process is when a subroutine is waiting for a response to a command that
it sent. If that subroutine is called from as an asynchronous event, it
blocks all other execution in that process. That same practice hurts in
SAMBA.
> 8. Design for testability. For example: don't rely on global
> resources that may not be available when testing; do make behaviours
> deterministic to ease testing.
Test programs that internally fork() are very troublesome for me.
Starting a few hundred individually by a script are not.
I can only read UNIX shell scripts of minor complexity.
> 10. Have a design that is as simple as possible.
> 11. "Smart clients, dumb servers." This is claimed to be a good
> design pattern for internet software. rsync at the moment does not
> really adhere to it. Part of the point of rsync is that having a
> smarter server can make things much more efficient. A strength of
> this approach is that to add features, you (often) only need to add
> them to the client.
It should be a case of who can do the job easier.
> 12. Try to keep the TCP pipe full in both directions at all times.
> Pursuing this intently has worked well in rsync, but has also led to
> a complicated design prone to deadlocks.
Deadlocks can be avoided. Make sure if an I/O is initiated, that the
next step is to return to the protocol dispatching routine.
> General design ideas
>
> 9 Model files as composed of a stream of bytes, plus an optional
> table of key-value attributes. Some of these can be distinguished to
> model ownership, ACLs, resource forks, etc.
Not portable. This will effectively either exclude all non-UNIX or make
it very difficult to port to them.
Binary files are a stream of bytes.
Text files are a stream of records. Many systems do not store text
files as a stream of bytes. They may or may not even be ASCII.
If you are going to maintain meta files for ACLs and Resource Forks.
Then there should be some provision to supply attributes for an entire
directory or individual files.
BINARY files are no real problem. The binary is either meaningful on
the client or server or it is not. However file attributes may need to
be maintained. If the file attributes are maintained, it would be
possible for me to have a OpenVMS indexed file moved up to a UNIX
server, and then back to another OpenVMS system and be usuable.
Currently in order to do so, I must encapsulate them in a .ZIP archive.
That is .ZIP, not GZIP or BZIP. On OpenVMS those are only useful to
transfer source and a limited subset of binaries.
TEXT files are much different than binary files, except on UNIX.
A text file needs to be processed by records, and on many systems can
not have the records updated randomly, or if they do it is not real
efficient.
If a target use for this program is to be for assisting in cross
platform open source synchronization, then it really needs to properly
address the text files.
A server should know how to represent a TEXT file in a portable format
to the client. Stream records in ASCII, delimited by Line-Feeds is
probably the most convenient.
The client would be responsible to make sure that a TEXT file is in a
local format.
Additional note #1:
I recall seeing a comment somewhere in this thread about timestamps
being left to 16 bits.
File timestamps for OpenVMS and for Windows NT are in 64 bits, but use
different base dates.
Using 16 bits for timestamps will result in a loss of data for these
platforms. For applications like open source distribution, the data
loss is probably not significant. For BACKUP type applications, it can
be significant.
Additional note #2:
File attributes need to be stored somewhere, so a reserved directory or
filename convention will need to be used.
I assume that there will be provisions for a server to be marked as a
master reference.
Additional note #3:
For flexability, a client may need to provide filename translation, so
the original filename (that will be used on the wire) should be stored
as a file attribute. It also follows that it probably is a good idea to
store the translated filename as an attribute also.
-John
wb8tyw at qsl.network
Personal Opinion Only
More information about the rsync
mailing list