Implementing a make command

Sun Feb 17 17:36:11 EST 2002

On Sun, Feb 17, 2002 at 04:20:18PM +1100, Michael Still wrote:
> I was hoping for a userland only solution, because this strikes me as
> being a lot easier to debug, and less likely to take my machine to a
> better place.
> 
> I wrote a three line mount program (just says hello in main()), and put it
> in /usr/local/bin (the same location as smbmount). When I try a mount -t
> mikal blah /tmp/blah I get a message saying 'mikal' isn't supported by the
> kernel.
> 
> Does this imply that it is impossible to have a userland only filesystem?
> 
It's impossible to have a userland /only/ filesystem - you need to
have some way of translating the kernel space system calls into
userland calls, and the only way to do that is to have kernel code.
It's possible to have a very thin layer that simply copies the VFS
method calls out to userspace and then receives replies, but that
makes it very dependant on the current implementation of the VFS. 

The canonical example is coda - it's a kernel module that implements
a translation between the Linux VFS and a generalised VFS interface
that they've defined. Build the coda module, load it, and then try
# mount -t coda none /mnt/
You won't get anything useful unless you have the userspace code for
coda, but the kernel will recognise the filesystem type - it's the
kernel layer that mount needs to be told about.

Take a look at Documentation/filesystems/coda.txt for the userspace
interface . . . It's designed around coda's conceptual setup, which
isn't ideal for a disk-based fs, but it's sufficiently general to
handle that kind of thing. 

Hmmm . . . This isn't a very coherent answer . . .

<lecture>
The Linux VFS/fs system is layered: you have the VFS (which stands
for Virtual FileSystem) which is the backend to the filesystem
syscalls, and underneath that are the individual filesystem drivers. 

The VFS is a collection of general code, and an interface that the 
filesystem drivers implement for servicing the system calls and VFS
calls. The idea is to have an interface that lets you have multiple
filesystem drivers without having to munge the kernel hideously - it
works quite well, since I count about 40 seperate filesystem drivers
under linux/fs/ . . . 

Filesystem drivers implement all the low level stuff - the 'real'
filesystem. The VFS processes syscalls by calling the appropriate
filesystem driver method. The filesystem driver has to set up the
method pointers to point to it's own code when it's creating an
object - after that, the VFS simply dereferences the pointer and
calls into that code. It's really neat, and it makes implementing a
filesystem in the kernel relatively simple . . .

When you load a filesystem driver it registers itself with the
kernel: it passes the kernel a name and a struct containing
filesystem wide methods. When you call mount, you pass it the name,
and the kernel looks through the list of registered filesystems. If
it finds the name, then it calls the fs's initialisation method
(read_super()), which sets up the root directory and so forth. If it
doesn't find the name, it returns with the unsupported filesystem
error. 

When you call open(), the VFS starts by calling the root
directory's lookup() method to get the inode that the first element
in the path you gave points to. If this is a directory inode, it
calls that inode's lookup(), and continues like this until it hits
the end of the path and has an inode to play with. It then calls
that inode's open() method (if there is one), which does whatever
file open stuff is needed for that filesystem.

The other system calls are serviced in much the same way - the VFS
translates between the syscall interface and the interface it
presents to the filesystems, and the filesystems provide methods to
implement that interface. The important thing to recognise is that
you /need/ to have some code in the kernel to implement the VFS
interface - it's the only way you can get access to the filesystem
syscalls you want to service. What you do with your kernel code is
entirely up to you - you can write to a block device, read and write
kernel parameters, manage device access, pass processing off to
userspace, or crash randomly . . . But without the kernel code, you
can't do a filesystem.
</lecture>

The short answer is: use coda, or one of the other interfaces that
are out there. Or write your own kernel code ;-) 

Of course, I could be wrong about stuff - I've only been playing
around with this stuff for a year or so, and lots of that time's
been wasted on things like classes and the like ;-) But this is
correct to the best of my knowledge.

Simon

-- 
PGP public key Id 0x144A991C, or ftp://bg77.anu.edu.au/pub/himi/himi.asc
(crappy) Homepage: http://bg77.anu.edu.au
doe #237 (see http://www.lemuria.org/DeCSS) 
My DeCSS mirror: ftp://bg77.anu.edu.au/pub/mirrors/css/ 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
Url : http://lists.samba.org/archive/linux/attachments/20020217/b2ad3973/attachment.bin