superlifter design notes (OpenVMS perspective)

Mon Jul 29 21:22:01 EST 2002

To help explain why the backup and file distribution have such different 
implementation issues, let me give some background.

This is a dump of an OpenVMS native text file.  This is the format that 
virtually all text editors produce on it.

Dump of file PROJECT_ROOT:[rsync_vms]CHECKSUM.C_VMS;1 on 29-JUL-2002 
22:02:21.32
File ID (118449,3,0)   End of file block 8 / Allocated 8

Virtual block number 1 (00000001), 512 (0200) bytes

  67697279 706F4320 20200025 2A2F0002 ../*%.   Copyrig 000000
  72542077 6572646E 41202943 28207468 ht (C) Andrew Tr 000010
  20200024 00363939 31206C6C 65676469 idgell 1996.$.   000020
  50202943 28207468 67697279 706F4320  Copyright (C) P 000030
  39312073 61727265 6B63614D 206C7561 aul Mackerras 19 000040
  72702073 69685420 20200047 00003639 96..G.   This pr 000050

Each record is preceded by a 16 bit count of how long the record is. 
While any value can be present in a record, ususally only printable 
ASCII is usually present.

When this type of file is read in through a C program, the records are 
translated so that it looks like each line of text is terminated by a 
line feed character.

So if I am just using a program ported from UNIX to read text files, 
there is no problem.  And pure binary files are not a problem because 
they have attributes that tell the I/O system that they are binary, not 
text files.

But the problem comes in when the remote system sends a request to 
update the middle of a file.  It sends me a byte offset.  Now at this 
point, I have to have kept track independantly in the program where the 
simulated offset is.  Now as long as the file is always sent in 
sequence, I have a hope of getting this right.  If the file updates are 
  sent in a random order, I can not.

Now this is the issues for using an rsync() like program for file 
distribution.  All I need to know is if the file being transferred is 
binary or text.  And while the ideal is for the system hosting the file 
to identify it, this can be faked by having a mapping of file types for 
default attributes.

So for text file transfers, as long as the sections are sent in 
sequence, not a problem.

Now for backup, if I am assuming that the system that will eventually 
use the backup understands the file format of the source, I can open the 
files as binary, so I do not have to be concerned about keeping track of 
where the logical offest maps to the physical offset.  However I have a 
whole new set of issues.

The file must be open in "binary" mode.  On an fopen() call, the "b" 
mode qualifier causes the file to be opened in binary mode, so no 
translation is done.  This has no effect on UNIX, but it is important on 
other file platforms.  This flag is documented as part of the ISO C 
standard, but has no effect on a UNIX platform.

For an open() call, a special operating system extension is needed to 
open the file in binary mode.

Then there are the file attributes:

CHECKSUM.C_VMS;1              File ID:  (118449,3,0)
Size:            8/8          Owner:    [SYSOP,MALMBERG]
Created:   29-JUL-2002 22:01:37.95
Revised:   29-JUL-2002 22:01:38.01 (1)
Expires:   <None specified>
Backup:    <No backup recorded>
Effective: <None specified>
Recording: <None specified>
File organization:  Sequential
Shelved state:      Online
Caching attribute:  Writethrough
File attributes:    Allocation: 8, Extend: 0, Global buffer count: 0
                     No version limit
Record format:      Variable length, maximum 0 bytes, longest 71 bytes
Record attributes:  Carriage return carriage control
RMS attributes:     None
Journaling enabled: None
File protection:    System:RWED, Owner:RWED, Group:RWED, World:RE
Access Cntrl List:  None
Client attributes:  None

And this is for a simple file format.  Files can be indexed or have 
multiple keys.

And there is no cross platform API for retrieving all of these 
attributes, so how do you determine how to transmit them through?

Security is another issue:

In some cases the binary values for the access control entries needs to 
preserved, and in other cases, the text values need to be preserved.
It also may need a translation from one set of text or binary values to 
another set.

And again, there are no cross platform API's for returning this information.

So a backup type application is going to have to have a lot of platform 
specific tweaks, and some way to pass all this varied information 
between the client and server.  As each platform is added, an extension 
may need to be developed.

A server definitely needs to know if it is in backup mode as opposed to 
file distribution mode.

In file distribution mode, only a few file attributes need to be 
preserved, and a loss of precision of dates is usually not a problem.

So while the two applications could be done in a single image, I still 
am of the opinion that they should be developed separately.

Maybe share a common support library, but I think that keeping them as 
separate programs may be better for support and development.

Especially if you mean for these to be cross platform.

It is likely that the backup function otherwise would only be useful for 
a subset of platforms.

Is it fair to have the people that can only use the file distribution 
part of the package, when porting be burdened with the backup portion?

It just seems that it is not too difficult to come up with a cross 
platform file distribution system that uses the principles developed 
with rsync.

A backup type application is going to be a problem for cross platform, 
and is likely to be limited to a subset of UNIX systems.

Or maybe a build option to build a full function superlifter, or just a 
superlifter lite?

-John
wb8tyw at qsl.network
Personal Opinion Only