superlifter design notes (OpenVMS perspective)

Tue Jul 30 00:22:02 EST 2002

On Tue, Jul 30, 2002 at 12:00:21AM -0400, John E. Malmberg wrote:
> To help explain why the backup and file distribution have such different 
> implementation issues, let me give some background.
> 
> 
> This is a dump of an OpenVMS native text file.  This is the format that 
> virtually all text editors produce on it.
> 
> Dump of file PROJECT_ROOT:[rsync_vms]CHECKSUM.C_VMS;1 on 29-JUL-2002 
> 22:02:21.32
> File ID (118449,3,0)   End of file block 8 / Allocated 8
> 
> Virtual block number 1 (00000001), 512 (0200) bytes
> 
>  67697279 706F4320 20200025 2A2F0002 ../*%.   Copyrig 000000
>  72542077 6572646E 41202943 28207468 ht (C) Andrew Tr 000010
>  20200024 00363939 31206C6C 65676469 idgell 1996.$.   000020
>  50202943 28207468 67697279 706F4320  Copyright (C) P 000030
>  39312073 61727265 6B63614D 206C7561 aul Mackerras 19 000040
>  72702073 69685420 20200047 00003639 96..G.   This pr 000050
> 
> Each record is preceded by a 16 bit count of how long the record is. 
> While any value can be present in a record, ususally only printable 
> ASCII is usually present.

> The file must be open in "binary" mode.  On an fopen() call, the "b" 
> mode qualifier causes the file to be opened in binary mode, so no 
> translation is done.  This has no effect on UNIX, but it is important on 
> other file platforms.  This flag is documented as part of the ISO C 
> standard, but has no effect on a UNIX platform.

While VMS and a few other OSs make the distinction between
text and binary files the VMS is fairly unique.  UNIX is our
primary focus and i don't intend to get bogged down with OS
specifics on all platforms.  POSIX has no mechanism for
determining the content of files.  All files are binary.

To meet your record-oriented text file needs i would say
that the VMS port would need to have a options and extra
logic.  For backups all file could be opened with the "b"
mode qualifier.

For sending to non-VMS systems text files would want
conversion to another format, and for receiving some
heuristics would identify text files for conversion
(updating text files could take advantage of the local
file's attributes).  Such file conversions would require
in-core translation for checksums, file length and change
merges.  This puts them into the same category as unix2dos
text-file conversions and backup compression.  Such file
conversions are outside the scope of current consideration
but where possible we should keep them in mind for future
enhancement.

> Then there are the file attributes:
> 
> CHECKSUM.C_VMS;1              File ID:  (118449,3,0)
> Size:            8/8          Owner:    [SYSOP,MALMBERG]
> Created:   29-JUL-2002 22:01:37.95
> Revised:   29-JUL-2002 22:01:38.01 (1)
> Expires:   <None specified>
> Backup:    <No backup recorded>
> Effective: <None specified>
> Recording: <None specified>
> File organization:  Sequential
> Shelved state:      Online
> Caching attribute:  Writethrough
> File attributes:    Allocation: 8, Extend: 0, Global buffer count: 0
>                     No version limit
> Record format:      Variable length, maximum 0 bytes, longest 71 bytes
> Record attributes:  Carriage return carriage control
> RMS attributes:     None
> Journaling enabled: None
> File protection:    System:RWED, Owner:RWED, Group:RWED, World:RE
> Access Cntrl List:  None
> Client attributes:  None
> 
> And this is for a simple file format.  Files can be indexed or have 
> multiple keys.
> 
> And there is no cross platform API for retrieving all of these 
> attributes, so how do you determine how to transmit them through?

We can't rely on a pre-existing cross-platform API.   What
I'm inclined toward is to use native I/O routines.  The
protocol would be focused on UNIX file semantics.  We might
add a few reasonable additional bits for those platforms
that will be VERY common interoperators.  These other
attributes i would treat as special extended attributes.

> Security is another issue:
> 
> In some cases the binary values for the access control entries needs to 
> preserved, and in other cases, the text values need to be preserved.
> It also may need a translation from one set of text or binary values to 
> another set.
> 
> And again, there are no cross platform API's for returning this information.

See above.  We need to support binary IDs and text IDs and
ID squashing.  I'm not sure yet but mode bits will probably
be binary.  There is no reason to transmit them as text.

> So a backup type application is going to have to have a lot of platform 
> specific tweaks, and some way to pass all this varied information 
> between the client and server.  As each platform is added, an extension 
> may need to be developed.

Platform specific tweaks will only be built into the
binaries for that platform.  The protocol will have certain
UNIX centricities but the flexibility to transmit platform
specifics.

> A server definitely needs to know if it is in backup mode as opposed to 
> file distribution mode.
> 
> In file distribution mode, only a few file attributes need to be 
> preserved, and a loss of precision of dates is usually not a problem.
> 
> So while the two applications could be done in a single image, I still 
> am of the opinion that they should be developed separately.
> 
> Maybe share a common support library, but I think that keeping them as 
> separate programs may be better for support and development.

For the server that may be the approach.

> It is likely that the backup function otherwise would only be useful for 
> a subset of platforms.

Agreed.  A backup server is limited by the semantics it can
support.  If a given platform cannot support all of the
semantics of another it would be a poor choice for that use.
That is already a limitation of rsync.  There are a couple
of patches floating around that allow non-unix platforms to
use rsync for same-platform backups but otherwise rsync is
POSIX only.

If our routines and protocol are robust and well enough
defined someone may wish to use them with something more
extensive than just a native filesystem for multi-platform
backups.  To the client such a backup server would be
indistinguishable from the normal utility/daemon.

> 
> Is it fair to have the people that can only use the file distribution 
> part of the package, when porting be burdened with the backup portion?

If someone wanted to create a port that had limited
functionality that would be his choice.  If we use masks to
designate attributes to be updated or compared such a
limited port could set that mask according to its
capabilities.

> 
> It just seems that it is not too difficult to come up with a cross 
> platform file distribution system that uses the principles developed 
> with rsync.
> 
> A backup type application is going to be a problem for cross platform, 
> and is likely to be limited to a subset of UNIX systems.

.From your description text file handling for
distribution is more problematic than file attributes for
cross-platform backups.

> 
> Or maybe a build option to build a full function superlifter, or just a 
> superlifter lite?

I thank you for your input on this.  While i don't want to
get bogged down in trying to support every strange system to
the burden of the majority, i don't want to 
lock out other platforms.

-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw at pegasys.ws

		Remember Cernan and Schmitt