better network filesystems

Tue Dec 7 05:06:59 GMT 2004

Michael B Allen wrote:

>Steve French said:
>  
>
>>We need better network filesystems - so says newsforge ... ?!
>>    
>>
>
>Nah, we just need one good one.
>
>  
>
Agreed - we need a smaller number than now :)   among other reasons 
becauase network filesystems are hard and time consuming to do and we 
should have it down after 19 years     (although for interop purposes 
three or four would be reasonably required - a surprising number of 
times there are very convincing reasons a customer can give why they 
have to retain some old working hardware/OS that your clients and/or 
servers have to interop with)

>>    http://www.newsforge.com/article.pl?sid=04/11/29/1528259&from=rss
>>    
>>
>
>Shallow article. Wasn't Intermezzo removed from Linux because it wasn't
>being maintained?
>  
>
Yes.  Intermezzo has been removed.

1) NFSv2/v3 is actively worked and maintained but obviously the quite 
primitive protocol limits some features that are often requested.
2) NFSv4 is fast improving in implementation and CITI is working it with 
some help from IBM. 
3) Hopefully you all know about the CIFS VFS and Samba
4) AFS has a very small client in the kernel tree, and another better 
known implementation OpenAFS not in the kernel tree that is considerably 
larger.   There are various problems with the two implementations but 
there is some maintainence going on, and we have AFS to thank for some 
cool VFS features (including the new possibility of (reasonably) locally 
caching to disk on the network client).  The inkernel client has had a 
minor change go in two months ago for credential keyring support.
5) Coda - no significant changeset on it in more than 4 months.  My 
guess is that it is not as heavily used as the four main ones.
6) ncpfs - not very active either

The other network-like filesystems - Lustre, SANFS, GPFS, and RedHat's 
GFS do differ a little..  They differ in that they would attempt 
stricter posix semantics and therefore view themselves as "cluster" 
rather than "network" filesystems (an odd distinction ... why shouldn't 
a network filesystem simply consider "cluster" in effect a mount option 
which would optimize for higher performance to nearby hosts in the 
cluster and stricter POSIX file semantics rather than relaxed "nfs file 
semantics").   If they had a good standards story with the IETF and were 
inkernel in 2.6, perhaps no one would care, but it seems odd - when you 
can make AFS or CIFS or NFSv4 do the same with rather more trivial changes.

>>cifs client (perhaps by 2.6.12) but it would not be -- that -- hard to
>>fix cifs/Samba or nfsv4 client/server with modest standards and protocol
>>enhancements to achieve their goals, and there are a lot of advantages
>>of Samba over the AFS server :)
>>    
>>
>
>AFS but it must stink too or we'd be using it). 
>
>CIFS is too many layers of protocol. Think about the
>layers one goes through to write a buffer of data in an RPC -- NetBIOS ->
>SMB -> SMB Transactions -> Named Pipe -> DCE PDU. NFS doesn't integrate
>with The Enterprise easily. Think about what is involved in setting up a
>server, creating an export and granting access to the right users.
>  
>
I disagree with that though - CIFS is as simple as I think you can 
reasonably get for the key read/write operations.
You have a length field and about a 33 byte SMB header and can 
read/write up to 128K with -- no RPC -- no XDR
(unlike NFS) and packet signing is quite simple compared to NFS - and 
probably less overhead.   SPNEGO is complicated
as are the Kerberos exchanges but it is hard to blame CIFS for that 
(everyone else is stuck with those IETF standards too and
they are fairly well proven).   To bring in DCE/RPC is not really fair 
as the ops that flow over DCE/RPC over CIFS generally
have no equivalent in NFS anyway and DCE/RPC is not needed/used in 
mapping kernel VFS ops to the Samba server
(maybe someday in  mount you -- might -- use a few RPCs).   I agree that 
DCE/RPC is hard for "managing" servers but have
you seen the current standards based approach to NAS management ... 
(SMI, the Storage Management Initiative work,
and OpenPegasus tooling underneath it is IMO very complicated compared 
to current LDAP or DCE/RPC based approach
to management of Samba or Windows servers - and SMI has very much less 
ambitious scope - far fewer NAS things
are management - I have found it very complicated to wade through the 
equivalent of the "Hello World" management API in
OpenPegasus - and much more complicated than what we suffer through with 
DCE/RPC, the LANMAN RPC, and LDAP).

CIFS has too many verbs and infolevels of course for the similar 
operations (as did ncpfs for that matter) and the header length
is not an even 40 bytes and there are fields that line up at the wrong 
boundaries in the frame to be perfectly efficient but I think
it is a little cleaner than NFS -> XDR/SunRPC from the wire 
perspective.   Just look how simple enabling IPv6 was for CIFS
(I have only looked at this from the perspective of the core protocol 
not the DCE/RPC layer) - it changed little or nothing and
compare that with the others.   I am also struck by how many misc. 
little tools and utility programs depend on the filesystem
protocol, its mount options etc. - it is not just as simple as inventing 
some new integalactic cluster distributed
network filesystem protocol.     Since we know CIFS and know what is 
suboptimal (mostly), I prefer just taking what
we know and trivially extending and fixing it (for optimal behavior with 
open clients - so they can get excellent semantics
and performance) as jra, tridge et al are doing now while maintaining 
the compat with the Windows half as best we reasonably can.

It is easy to make design mistakes (which we realize as we poke holes at 
SMB/CIFS and NFS) but I worry with all of the
discussion about new cluster filesystems that new unrelated protocols 
don't necessarily fix old mistakes (made by the protocols
we have learned to love and hate) as they reinvent the wheel  ...  :)