superlifter design notes (OpenVMS perspective)

Sun Jul 21 23:08:01 EST 2002

On Mon, Jul 22, 2002 at 03:34:37PM +1000, Martin Pool wrote:
> On 22 Jul 2002, "John E. Malmberg" <wb8tyw at qsl.net> wrote:
> > 
> > If you structure the protocol processing where no subroutine ever posts 
> > a write and then waits for a read, you can set up a library that can be 
> > used either blocking or non-blocking.
> 
> Yes, that's how librsync is structured.
> 
> Is it reasonable to assume that some kind of poll/select arrangement
> is available everywhere?  In other words, can I check to see if input
> is available from a socket without needing to block trying to read
> from it?

I think we can assume that any platform supporting POSIX I/O
semantics will be sufficient.

> I would hope that only a relatively small layer needs to know about
> how and when IO is scheduled.  It will make callbacks (or whatever) to
> processes that produce and consume data.  That layer can be adapted,
> or if necessary, rewritten, to use whatever async IO features are
> available on the relevant platform.

That is the better approach.  Use I/O routines so most
processing can be "while (get_input()) { process(); send_output()}"
Then the I/O routines can be defined accorinding to platform.

[snip]

> > > 9  Model files as composed of a stream of bytes, plus an optional
> > > table of key-value attributes. Some of these can be distinguished to
> > > model ownership, ACLs, resource forks, etc.
> > 
> > Not portable.  This will effectively either exclude all non-UNIX or make 
> > it very difficult to port to them.
> 
> "Non-UNIX" is not completely fair; as far as I know MacOS, Amiga,
> OS/2, Windows, BeOS, and QNX are {byte stream + attributes + forks}
> too.
> 
> I realize there are platforms which are record-oriented, but I don't
> have much experience on them.  How would the rsync algorithm even
> operate on such things?
> 
> Is it sufficient to model them as ascii+linefeeds internally, and then
> do any necessary translation away from that model on IO?
> 
> > BINARY files are no real problem.  The binary is either meaningful on 
> > the client or server or it is not.  However file attributes may need to 
> > be maintained.  If the file attributes are maintained, it would be 
> > possible for me to have a OpenVMS indexed file moved up to a UNIX 
> > server, and then back to another OpenVMS system and be usuable.

If a platform has some special type of file it would be
responsible for converting to/from a multi-segment
bytestream.

By multi-segement bytestream i mean a sequence of
binary_data blocks having an offset and length.  In this way
we have the potential to deal with sparse files and to
packetize the transfers of large files.  Obviously offset
and size are 64bit.

> 
> Possibly it would be nice to have a way to stash attributes that
> cannot be represented on the destination filesystem, but perhaps that
> is out of scope.

In general what we have to expect is that we can only
transfer the lowest common denominator of file attributes.

It would be possible to build a server that didn't depend on
local filesystem semantics and so could support an attribute
superset.  But that is out of scope for now.

> > File timestamps for OpenVMS and for Windows NT are in 64 bits, but use 
> > different base dates.
> 
> I think we should use something like 64-bit microseconds-since-1970,
> with a precision indicator.
> 
> > File attributes need to be stored somewhere, so a reserved directory or 
> > filename convention will need to be used.
> > 
> > I assume that there will be provisions for a server to be marked as a 
> > master reference.
> 
> What do you mean "master reference"?

See my super/subset comment above.

> 
> > For flexability, a client may need to provide filename translation, so 
> > the original filename (that will be used on the wire) should be stored 
> > as a file attribute.  It also follows that it probably is a good idea to 
> > store the translated filename as an attribute also.
> 
> Can you give us an example?  Are you talking about things like
> managing case-insensitive systems?

Filenames should be null terminated UTF-8.  If a given
platform cannot support the port to that platform will be
responsible for conversion.  We probably should designate an
inline subroutine for filename converson.  The only
alternative would be to restrict filenames to ascii
[-_.A-Za-z0-9] or something similarly restrictive.  I find
the use of funny chars (including space) in filenames
offensive but we need to deal with internationalizations and
sheer stupidity.

-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw at pegasys.ws

		Remember Cernan and Schmitt