LF-delimited files are corrupted when written to a Samba/VMS share
Ben Armstrong
BArmstrong at dymaxion.ca
Tue Jun 27 13:12:21 GMT 2006
A problem we've been unable to resolve for some time is that
LF-delimited files are corrupted when they are written to a Samba/VMS
share, double-spacing every record. I have at least once before
mentioned this problem (the most recent time I have on record being
August 2005) but have no record of having received an answer.
A trivial ruby script can be used to demonstrate the problem. (C++ and
perl test programs can reproduce it too; see the perl example at end of
this message.) The client system in this test case is a Linux system,
which considers a bare LF to be a newline:
$ ruby -e 'puts "a\nb"' >bg1.tmp
A:BG> dir/full bg1.tmp
....
Record format: Stream, maximum 0 bytes, longest 32767 bytes
...
A:BG> dump/rec bg1.tmp
...
Record number 1 (00000001), 2 (0002) bytes, RFA(0001,0000,0000)
0A61
a............................... 000000
Record number 2 (00000002), 2 (0002) bytes, RFA(0001,0000,0002)
0A62
b............................... 000000
A:BG> dump bg1.tmp
...
00000000 00000000 00000000 00000000 00000000 00000000 00000000 0A620A61
a.b............................. 000000
...
A:BG> type bg1.tmp
a
b
A:BG>
You can see why "type" is printing the file as double-spaced. The file
now consists of two records, each of which contains two characters,
*including* the LF line delimiter. Stream is apparently a very
"forgiving" file format, which does actually consider records to end at
LF characters, as well as at a wide variety of other possible delimiters
(e.g. form-feed, CR+LF, vertical tab, etc.) However, out of all of
these possible delimiters, it seems that only the CR+LF pair is excluded
from the record itself.
The difficulty is, there is no way I know of for the client
application/system to convey that the file is LF-delimited and must
remain LF-delimited, and therefore should be written as Stream-LF. The
end result is that any LF-delimited file written to the Samba share is
corrupted, being converted into a double-spaced file, so far as RMS is
concerned. The corruption gets worse as further reads & writes occur on
the file from both systems, double-double spacing the file, then
double-double-double-spacing it, etc.
A Windows client system running natively compiled Ruby, which considers
newlines to consist of CR+LF, does not exhibit the same problem behaviour.
Here are the results when the file is created on the Windows system with
the same Ruby test:
$ ruby -e 'puts "a\nb"' >bg1.tmp
A:BG> dir/full bg2.tmp
...
Record format: Stream, maximum 0 bytes, longest 32767 bytes
...
A:BG> dump/rec bg2.tmp
...
Record number 1 (00000001), 1 (0001) byte, RFA(0001,0000,0000)
61
a............................... 000000
Record number 2 (00000002), 1 (0001) byte, RFA(0001,0000,0003)
62
b............................... 000000
A:BG> dump bg2.tmp
...
00000000 00000000 00000000 00000000 00000000 00000000 00000A0D 620A0D61
a..b............................ 000000
...
A:BG> type bg2.tmp
a
b
A:BG>
As you can see, we now get the expected results, a single-spaced file
containing CR+LF-delimited lines in our "Stream" type file. RMS sees
the CR+LF delimiters as terminating each record, and does not consider
them to be a part of the record itself.
So if all applications on all three platforms could agree to use CR+LF
as the "canonical" text file format, we wouldn't have this problem.
However, even Ruby on VMS, and also Perl on VMS (the more popular and
widely used of the two) are examples of applications that insist on
writing Stream_LF files by default. For example:
A:BG> perl -e "print ""a\nb\n"";" >bg3.tmp
A:BG> dir/full bg.tmp
...
Record format: Stream_LF, maximum 0 bytes, longest 32767 bytes
...
A:BG> dump/rec bg3.tmp
...
Record number 1 (00000001), 1 (0001) byte, RFA(0001,0000,0000)
61
a............................... 000000
Record number 2 (00000002), 1 (0001) byte, RFA(0001,0000,0002)
62
b............................... 000000
A:BG> dump bg3.tmp
...
00000000 00000000 00000000 00000000 00000000 00000000 00000000 0A620A61
a.b............................. 000000
...
Furthermore, not every Windows application writes CR+LF line terminators
in text files. For instance, Vim for Windows (the text editor we use)
understands how to read "unix" (LF-delimited) text files and "dos"
(CR+LF-delimited) text files, and preserves the original line terminator
type when it is written out again. With an increasing number of open
source applications being ported to the Windows platform, and which must
operate correctly in a mixed-platform environment, LF-delimited text
files written from a Windows system are now a fact of life that cannot
be easily worked around.
In conclusion, it is not a practical solution to insist that all text
files be written as CR+LF-delimited. Samba/VMS *must* accommodate for
LF-delimited text files somehow. Without a solution for this problem,
the product's usefulness in a production cross-platform environment is
seriously limited. If anyone has any idea how we can solve the problem
effectively, please share it!
Ben
More information about the samba-vms
mailing list