superlifter design notes (OpenVMS perspective)
John E. Malmberg
wb8tyw at qsl.net
Mon Jul 29 21:22:01 EST 2002
To help explain why the backup and file distribution have such different
implementation issues, let me give some background.
This is a dump of an OpenVMS native text file. This is the format that
virtually all text editors produce on it.
Dump of file PROJECT_ROOT:[rsync_vms]CHECKSUM.C_VMS;1 on 29-JUL-2002
22:02:21.32
File ID (118449,3,0) End of file block 8 / Allocated 8
Virtual block number 1 (00000001), 512 (0200) bytes
67697279 706F4320 20200025 2A2F0002 ../*%. Copyrig 000000
72542077 6572646E 41202943 28207468 ht (C) Andrew Tr 000010
20200024 00363939 31206C6C 65676469 idgell 1996.$. 000020
50202943 28207468 67697279 706F4320 Copyright (C) P 000030
39312073 61727265 6B63614D 206C7561 aul Mackerras 19 000040
72702073 69685420 20200047 00003639 96..G. This pr 000050
Each record is preceded by a 16 bit count of how long the record is.
While any value can be present in a record, ususally only printable
ASCII is usually present.
When this type of file is read in through a C program, the records are
translated so that it looks like each line of text is terminated by a
line feed character.
So if I am just using a program ported from UNIX to read text files,
there is no problem. And pure binary files are not a problem because
they have attributes that tell the I/O system that they are binary, not
text files.
But the problem comes in when the remote system sends a request to
update the middle of a file. It sends me a byte offset. Now at this
point, I have to have kept track independantly in the program where the
simulated offset is. Now as long as the file is always sent in
sequence, I have a hope of getting this right. If the file updates are
sent in a random order, I can not.
Now this is the issues for using an rsync() like program for file
distribution. All I need to know is if the file being transferred is
binary or text. And while the ideal is for the system hosting the file
to identify it, this can be faked by having a mapping of file types for
default attributes.
So for text file transfers, as long as the sections are sent in
sequence, not a problem.
Now for backup, if I am assuming that the system that will eventually
use the backup understands the file format of the source, I can open the
files as binary, so I do not have to be concerned about keeping track of
where the logical offest maps to the physical offset. However I have a
whole new set of issues.
The file must be open in "binary" mode. On an fopen() call, the "b"
mode qualifier causes the file to be opened in binary mode, so no
translation is done. This has no effect on UNIX, but it is important on
other file platforms. This flag is documented as part of the ISO C
standard, but has no effect on a UNIX platform.
For an open() call, a special operating system extension is needed to
open the file in binary mode.
Then there are the file attributes:
CHECKSUM.C_VMS;1 File ID: (118449,3,0)
Size: 8/8 Owner: [SYSOP,MALMBERG]
Created: 29-JUL-2002 22:01:37.95
Revised: 29-JUL-2002 22:01:38.01 (1)
Expires: <None specified>
Backup: <No backup recorded>
Effective: <None specified>
Recording: <None specified>
File organization: Sequential
Shelved state: Online
Caching attribute: Writethrough
File attributes: Allocation: 8, Extend: 0, Global buffer count: 0
No version limit
Record format: Variable length, maximum 0 bytes, longest 71 bytes
Record attributes: Carriage return carriage control
RMS attributes: None
Journaling enabled: None
File protection: System:RWED, Owner:RWED, Group:RWED, World:RE
Access Cntrl List: None
Client attributes: None
And this is for a simple file format. Files can be indexed or have
multiple keys.
And there is no cross platform API for retrieving all of these
attributes, so how do you determine how to transmit them through?
Security is another issue:
In some cases the binary values for the access control entries needs to
preserved, and in other cases, the text values need to be preserved.
It also may need a translation from one set of text or binary values to
another set.
And again, there are no cross platform API's for returning this information.
So a backup type application is going to have to have a lot of platform
specific tweaks, and some way to pass all this varied information
between the client and server. As each platform is added, an extension
may need to be developed.
A server definitely needs to know if it is in backup mode as opposed to
file distribution mode.
In file distribution mode, only a few file attributes need to be
preserved, and a loss of precision of dates is usually not a problem.
So while the two applications could be done in a single image, I still
am of the opinion that they should be developed separately.
Maybe share a common support library, but I think that keeping them as
separate programs may be better for support and development.
Especially if you mean for these to be cross platform.
It is likely that the backup function otherwise would only be useful for
a subset of platforms.
Is it fair to have the people that can only use the file distribution
part of the package, when porting be burdened with the backup portion?
It just seems that it is not too difficult to come up with a cross
platform file distribution system that uses the principles developed
with rsync.
A backup type application is going to be a problem for cross platform,
and is likely to be limited to a subset of UNIX systems.
Or maybe a build option to build a full function superlifter, or just a
superlifter lite?
-John
wb8tyw at qsl.network
Personal Opinion Only
More information about the rsync
mailing list