better network filesystems
Steve French
smfrench at austin.rr.com
Tue Dec 7 05:06:59 GMT 2004
Michael B Allen wrote:
>Steve French said:
>
>
>>We need better network filesystems - so says newsforge ... ?!
>>
>>
>
>Nah, we just need one good one.
>
>
>
Agreed - we need a smaller number than now :) among other reasons
becauase network filesystems are hard and time consuming to do and we
should have it down after 19 years (although for interop purposes
three or four would be reasonably required - a surprising number of
times there are very convincing reasons a customer can give why they
have to retain some old working hardware/OS that your clients and/or
servers have to interop with)
>> http://www.newsforge.com/article.pl?sid=04/11/29/1528259&from=rss
>>
>>
>
>Shallow article. Wasn't Intermezzo removed from Linux because it wasn't
>being maintained?
>
>
Yes. Intermezzo has been removed.
1) NFSv2/v3 is actively worked and maintained but obviously the quite
primitive protocol limits some features that are often requested.
2) NFSv4 is fast improving in implementation and CITI is working it with
some help from IBM.
3) Hopefully you all know about the CIFS VFS and Samba
4) AFS has a very small client in the kernel tree, and another better
known implementation OpenAFS not in the kernel tree that is considerably
larger. There are various problems with the two implementations but
there is some maintainence going on, and we have AFS to thank for some
cool VFS features (including the new possibility of (reasonably) locally
caching to disk on the network client). The inkernel client has had a
minor change go in two months ago for credential keyring support.
5) Coda - no significant changeset on it in more than 4 months. My
guess is that it is not as heavily used as the four main ones.
6) ncpfs - not very active either
The other network-like filesystems - Lustre, SANFS, GPFS, and RedHat's
GFS do differ a little.. They differ in that they would attempt
stricter posix semantics and therefore view themselves as "cluster"
rather than "network" filesystems (an odd distinction ... why shouldn't
a network filesystem simply consider "cluster" in effect a mount option
which would optimize for higher performance to nearby hosts in the
cluster and stricter POSIX file semantics rather than relaxed "nfs file
semantics"). If they had a good standards story with the IETF and were
inkernel in 2.6, perhaps no one would care, but it seems odd - when you
can make AFS or CIFS or NFSv4 do the same with rather more trivial changes.
>>cifs client (perhaps by 2.6.12) but it would not be -- that -- hard to
>>fix cifs/Samba or nfsv4 client/server with modest standards and protocol
>>enhancements to achieve their goals, and there are a lot of advantages
>>of Samba over the AFS server :)
>>
>>
>
>AFS but it must stink too or we'd be using it).
>
>CIFS is too many layers of protocol. Think about the
>layers one goes through to write a buffer of data in an RPC -- NetBIOS ->
>SMB -> SMB Transactions -> Named Pipe -> DCE PDU. NFS doesn't integrate
>with The Enterprise easily. Think about what is involved in setting up a
>server, creating an export and granting access to the right users.
>
>
I disagree with that though - CIFS is as simple as I think you can
reasonably get for the key read/write operations.
You have a length field and about a 33 byte SMB header and can
read/write up to 128K with -- no RPC -- no XDR
(unlike NFS) and packet signing is quite simple compared to NFS - and
probably less overhead. SPNEGO is complicated
as are the Kerberos exchanges but it is hard to blame CIFS for that
(everyone else is stuck with those IETF standards too and
they are fairly well proven). To bring in DCE/RPC is not really fair
as the ops that flow over DCE/RPC over CIFS generally
have no equivalent in NFS anyway and DCE/RPC is not needed/used in
mapping kernel VFS ops to the Samba server
(maybe someday in mount you -- might -- use a few RPCs). I agree that
DCE/RPC is hard for "managing" servers but have
you seen the current standards based approach to NAS management ...
(SMI, the Storage Management Initiative work,
and OpenPegasus tooling underneath it is IMO very complicated compared
to current LDAP or DCE/RPC based approach
to management of Samba or Windows servers - and SMI has very much less
ambitious scope - far fewer NAS things
are management - I have found it very complicated to wade through the
equivalent of the "Hello World" management API in
OpenPegasus - and much more complicated than what we suffer through with
DCE/RPC, the LANMAN RPC, and LDAP).
CIFS has too many verbs and infolevels of course for the similar
operations (as did ncpfs for that matter) and the header length
is not an even 40 bytes and there are fields that line up at the wrong
boundaries in the frame to be perfectly efficient but I think
it is a little cleaner than NFS -> XDR/SunRPC from the wire
perspective. Just look how simple enabling IPv6 was for CIFS
(I have only looked at this from the perspective of the core protocol
not the DCE/RPC layer) - it changed little or nothing and
compare that with the others. I am also struck by how many misc.
little tools and utility programs depend on the filesystem
protocol, its mount options etc. - it is not just as simple as inventing
some new integalactic cluster distributed
network filesystem protocol. Since we know CIFS and know what is
suboptimal (mostly), I prefer just taking what
we know and trivially extending and fixing it (for optimal behavior with
open clients - so they can get excellent semantics
and performance) as jra, tridge et al are doing now while maintaining
the compat with the Windows half as best we reasonably can.
It is easy to make design mistakes (which we realize as we poke holes at
SMB/CIFS and NFS) but I worry with all of the
discussion about new cluster filesystems that new unrelated protocols
don't necessarily fix old mistakes (made by the protocols
we have learned to love and hate) as they reinvent the wheel ... :)
More information about the samba-technical
mailing list