Thoughts on dos cr/lf conversion

Wed Nov 11 06:22:08 GMT 1998

Here is a first pass at a design for a cr/lf conversion vfs module.
Comments appreciated!  I'm sure other people have thought longer about
this problem.

For each open file, store blocks of text separated by \n characters.  With
each text block, keep an offset into the unix file and a calculated offset
into the dos file (unix offset + number of blocks before this one).  These
offsets can be used for seeks.  Reads would iterate over the blocks and
writes would be broken up into \n separated blocks, cr/lf translation
written to disk and updating the block data structures.  Truncating a file
would be disposing of every block greater than the truncation point. 
Writing in place - a bit trickier. All this information would be generated
on demand with the file sizes being cached somewhere in a "hidden" file in
the directory as just listing an directory could cause every file to be
opened and completely read.  )-: 

The success of this method would hinge on how many optimisations can be
done as there is going to be a lot of mess for each file opened by samba. 
Off the top of my head, lazy updating of the text block data structures,
caching, fast searching for a particular file position, and storing/not
storing the actual text of each block in the data structure to trade off
speed vs memory.  I'm thinking of some sort of balanced tree or maybe a
hash table to find file positions quickly.

A question:  What is the best way to differentiate between executable
files and text files?  You could have a configurable list of filename
extensions to treat as text and/or a list of extensions to treat as
binary.  Perhaps some sort of algorithm like file(1) uses - just check the
first 32 bytes or so and see if they are ascii characters. 

I can't believe the number of "how to I convert cr/lf" messages on the
comp.protocols.smb newsgroup.  Must be a popular thing to try and do...

Tim.