superlifter design notes (OpenVMS perspective)

Sat Jul 27 19:11:01 EST 2002

Martin Pool wrote:
> 
> On 22 Jul 2002, "John E. Malmberg" <wb8tyw at qsl.network> wrote:
> 
> 
>>A clean design allows optimization to be done by the compiler, and tight 
>>optimization should be driven by profiling tools.
> 
> 
> Right.  So, for example, glib has a very smart assembly ntohl() and
> LZO is tight code.  I would much rather use them than try to reduce
> the byte count by a complicated protocol.

Many compilers will inline ntohl() giving the call very low overhead.

>>
>>>5. Similarly, no silly tricks with forking, threads, or nonblocking
>>>IO: one process, one IO.
>>
>>Forking or multiple processes can be high cost on some platforms.  I am 
>>not experienced with Posix threads to judge their portability.
>>
>>But as long as it is done right, non-blocking I/O is not a problem for me.
>>
>>If you structure the protocol processing where no subroutine ever posts 
>>a write and then waits for a read, you can set up a library that can be 
>>used either blocking or non-blocking.
> 
> 
> Yes, that's how librsync is structured.
> 
> Is it reasonable to assume that some kind of poll/select arrangement
> is available everywhere?  In other words, can I check to see if input
> is available from a socket without needing to block trying to read
> from it?

I can poll, but I prefer to cause the I/O completion to trigger a 
completion routine.  But that is not portable. :-)

> I would hope that only a relatively small layer needs to know about
> how and when IO is scheduled.  It will make callbacks (or whatever) to
> processes that produce and consume data.  That layer can be adapted,
> or if necessary, rewritten, to use whatever async IO features are
> available on the relevant platform.
>  
>>Test programs that internally fork() are very troublesome for me. 
>>Starting a few hundred individually by a script are not.
> 
> If we always use fork/exec (aka spawn()) is that OK?  Is it only
> processes that fork and that then continue executing the same program
> that cause trouble?

Mainly.  I can deal with spawn() much easier than fork()

> 
>>I can only read UNIX shell scripts of minor complexity.
>  
> Apparently Python runs on VMS.  I'm in favour of using it for the test
> suite; it's much more effective than sh.

Unfortunately the Python maintainer for VMS retired, and I have not been 
able to figure out how to get his source to compile.  I have got the 
official Python to compile and link with only having to fix one severe 
programming error.  However it still is not running.  I am isolating 
where the problem is in my "free" time.

>>>12. Try to keep the TCP pipe full in both directions at all times.
>>>Pursuing this intently has worked well in rsync, but has also led to
>>>a complicated design prone to deadlocks.
>>
>>Deadlocks can be avoided.
> 
> Do you mean that in the technical sense of "deadlock avoidance"?
> i.e. checking for a cycle of dependencies and failing?  That sounds
> undesirably complex.

No by not using a complex protocol, so that there are no deadlocks.
> 
>>>9  Model files as composed of a stream of bytes, plus an optional
>>>table of key-value attributes. Some of these can be distinguished to
>>>model ownership, ACLs, resource forks, etc.
>>
>>Not portable.  This will effectively either exclude all non-UNIX or make 
>>it very difficult to port to them.
>  
> "Non-UNIX" is not completely fair; as far as I know MacOS, Amiga,
> OS/2, Windows, BeOS, and QNX are {byte stream + attributes + forks}
> too.
>
> I realize there are platforms which are record-oriented, but I don't
> have much experience on them.  How would the rsync algorithm even
> operate on such things?

Record files need to be transmitted on record boundaries, not arbitrary 
boundaries.  Also random access can not be used.  The file segments need 
to be transmitted in order.

For a UNIX text file, a record is a line of text deliminated by the 
line-feed character.

[This is turned out to be a big problem in porting SAMBA.  An NT client 
transfers a large file by sending 64K, skipping 32K, sending some more 
and then sending the 32K later.  Samba does not do this, so the 
resulting corruption of a record structured file did not show up in the 
initial testing.  I still have not found the ideal fix for SAMBA, but
implemented a workaround.]

> Is it sufficient to model them as ascii+linefeeds internally, and then
> do any necessary translation away from that model on IO?

Yes as long as no partial records are transmitted.  Partial records can 
be a problem.  If I now the rest of the record is coming, then I can 
wait for it, but if the rest of the record is going to be skipped, then 
it takes more work.

> 
>>BINARY files are no real problem.  The binary is either meaningful on 
>>the client or server or it is not.  However file attributes may need to 
>>be maintained.  If the file attributes are maintained, it would be 
>>possible for me to have a OpenVMS indexed file moved up to a UNIX 
>>server, and then back to another OpenVMS system and be usuable.
>  
> Possibly it would be nice to have a way to stash attributes that
> cannot be represented on the destination filesystem, but perhaps that
> is out of scope.

I would anticipate having an optional attribute file for each directory 
for attributes common to all files in the directory, and optional 
attribute file for each file.  This would allow a server to handle files 
for other platforms.

> 
>>I recall seeing a comment somewhere in this thread about timestamps 
>>being left to 16 bits.
>  
> No, 32 bits.  16 bits is obviously silly.

Rushed typing.  I meant 32 bits.  Yes 16 bits is obviously silly.

>>File timestamps for OpenVMS and for Windows NT are in 64 bits, but use 
>>different base dates.
>  
> I think we should use something like 64-bit microseconds-since-1970,
> with a precision indicator.

It sounds like a set of host functions/macros will be needed.

>>File attributes need to be stored somewhere, so a reserved directory or 
>>filename convention will need to be used.
>>
>>I assume that there will be provisions for a server to be marked as a 
>>master reference.
>  
> What do you mean "master reference"?

For mirroring of distributions.  The server maintained directly by the 
team would be marked as a tier 1 or "master reference".

The next level of mirrors would get higher numbered tiers.  More for 
bookkeeping to where the files are coming from.

For example direct access to parts of SAMBA.ORG by everyone would 
overload the server, so the use of local mirrors are recommended.

The primary mirrors would be marked as tier 2, and mirrors of them would 
get a higher number.

Nothing requires this, but this may be useful for some.
A level can only accept updates from a lower numbered server.
> 
>>For flexability, a client may need to provide filename translation, so 
>>the original filename (that will be used on the wire) should be stored 
>>as a file attribute.  It also follows that it probably is a good idea to 
>>store the translated filename as an attribute also.
>  
> Can you give us an example?  Are you talking about things like
> managing case-insensitive systems?

Yes and other issues.

Say you have a source module name foo.dat++, that file name can not be 
represented on OpenVMS ODS-2 filesystems.

One way of doing this is to have the OpenVMS client convert it to 
FOO.DAT_PLS_PLS when it receives it.

However if the OpenVMS system wants to be a mirror for distribution, it 
needs some way to send that out.

For SAMBA, the file would be stored as FOO.DAT__2B__2B.  So SAMBA 
clients will see the original file name, case is not preserved.
One other issue with the OpenVMS is for ODS-2, the file names are 
limited to 39 characters, a period delimiter, and then another 39 
characters.  So you can see that there are limits to the hex expansion.

So the server would need some way of knowing what name that clients need 
to see the name in.

The portable solution is to have an attribute file that can optionally 
be beside the real file.  This attribute file can hold those attributes 
that are foreign to the host operating system.

So UNIX systems would see the file as foo.dat++, and OpenVMS could see 
the file as FOO.DAT_PLS_PLS or other local format.

Client:
    Hello server, I am looking for the xxxxx distribution.

Server:
    Accepted, I have the xxxxxx distribution.

Server:
    Here is the first directory, name is main.master
    Here are the global attributes for the directory.
    Default format for files in this directory is plain text.

Client, directory attributes accepted.
    Please update attributes:
         x_openvms_filename: main_master

Server (possible response one):
    Sorry, you are not on the list of clients I can accept updates from.

Server (possible response two):
    Update has been submitted to maintainer for review.

Server (possible response three)
    Update has been accepted to attributes.

This way the program can optionally deal with platform specific 
attributes, but not really need to understand them.

The attribute file would probably be in plain text.  This takes up a bit 
more room, but makes it maintainable from a text editor.

Of course a platform can store the attributes in any fashion, but I 
would expect that a file would most commonly be used.

-John
wb8tyw at qsl.network
Personal Opinion Only