[PATCH 3/3 v4] vfs_glusterfs: Samba VFS module for glusterfs

Fri May 24 17:17:48 MDT 2013

On 5/24/13 5:18 AM, Simo wrote:
>> +    for (i = 0; i < glfd_fd_size; i++)
>> +        if (!glfd_fd[i])
>> +            break;
>
> Would be better to have braces around here (for both statements).
> In general we recommend braces even for one line if statements, as we've
> been repeatedly bitten in the past by modifications that added a new
> statement to the if w/o realizing braces were not there.

Done.

>> +    glfd_fd_used++;
>> +    glfd_fd[i] = glfd;
>> +    return i;
>> +}
>> +
>> +    for (i = 0; i < glfs_preopened_size; i++)
>> +        if (!glfs_preopened[i].volume)
>> +            break;
>
> Same as above.

Done.

>> +static void glfs_clear_preopened(glfs_t *fs)
>> +{
>> +    int i;
>> +
>> +    for (i = 0; i < glfs_preopened_size; i++) {
>> +        if (glfs_preopened[i].fs == fs) {
>> +            if (--glfs_preopened[i].ref)
>> +                return;
>> +
>> +            talloc_free(glfs_preopened[i].volume);
>> +            glfs_fini(glfs_preopened[i].fs);
>> +            ZERO_STRUCTP(&glfs_preopened[i]);
>> +            return;
>> +        }
>> +    }
>
> By using an array you are forced to create this cleanup function,
> however if you use a doubly linked list you could simply free the fs and
> use a talloc destructor to perform the clean up. Have you though about
> doing that ?

I did not look around enough in the samba code to find a list template 
and did not want to write list primitives from scratch. Is there a 
specific list template in samba which you recommend?

>> +
>> +    assert(!"filesystem context not found");
>
> I would prefer no asserts, and a DEBUG level 0 error.

Done.

>> +static struct tevent_req *vfs_gluster_pread_send(struct
>> vfs_handle_struct
>> +                         *handle, TALLOC_CTX *mem_ctx,
>> +                         struct tevent_context *ev,
>> +                         files_struct *fsp, void *data,
>> +                         size_t n, off_t offset)
>> +{
>> +    errno = ENOTSUP;
>> +    return NULL;
>> +}
>
> Just FYI.
> When you return NULL the errno is compeltely ignored and a ENOMEM error
> is assumed.
> If you want to return ENOTSUP you'd have to allocate a sub request and
> then set an error using the appropriate tevent function call.

I'm not sure I completely understand what a sub request is (or what an 
appropriate tevent function call is). I believe this call is required to 
implement AIO, which we plan to next (as mentioned in the TODO above). 
Can we leave this as-is for now as the code path is not executed? It 
will anyways get addressed during the AIO support.

>> +static ssize_t vfs_gluster_pread_recv(struct tevent_req *req, int *err)
>> +{
>> +    errno = ENOTSUP;
>> +    return -1;
>> +}
>
> If the _send is not successful the _read is never called, and errno is
> also ignored, you'd need to return the error in *err.
>
> These 2 considerations hold for all the following _send/_recv function
> pairs.
>
>> +static int vfs_gluster_fallocate(struct vfs_handle_struct *handle,
>> +                 struct files_struct *fsp,
>> +                 enum vfs_fallocate_mode mode,
>> +                 off_t offset, off_t len)
>> +{
>> +    errno = ENOTSUP;
>> +    return -1;
>> +}
>
> Are you sure you do not want to implement this one ?
> If you return ENOTSUP samba will use vfs_slow_fallocate() which will use
> pwrite to send a ton of zeros to your filesystem. Is this the most
> efficient way on gluster ?

fallocate() support is currently under review in glusterfs (likely to be 
merged soon). This vfs_slow_fallocate() behavior is not a new issue as 
this has been how it worked with the FUSE mount as well. Once 
fallocate() support is merged, I will send a new patch to use the API call.

>> +static int vfs_gluster_kernel_flock(struct vfs_handle_struct *handle,
>> +                    files_struct *fsp, uint32 share_mode,
>> +                    uint32_t access_mask)
>> +{
>> +    return 0;
>> +}
>
> Lieing about the success of locking is not really a good idea.
> I think you should return -1 here and advice users of gluster vfs to set
> "kernel share modes" to false instead.

Does this flock() strictly have to be in a separate lock space from 
fcntl()? In Linux they are separete, but how does samba deal with other 
OSes where they compete in the same lockspace? In gluster we have a 
single lockspace. Will errno = ENOSYS make samba autodetect the lack of 
'kernel share modes' support?

>> +static NTSTATUS vfs_gluster_notify_watch(struct vfs_handle_struct
>> *handle,
>> +                     struct sys_notify_context *ctx,
>> +                     const char *path, uint32_t *filter,
>> +                     uint32_t *subdir_filter,
>> +                     void (*callback) (struct sys_notify_context *ctx,
>> +                               void *private_data,
>> +                               struct notify_event *ev),
>> +                     void *private_data, void *handle_p)
>> +{
>> +    return NT_STATUS_OK;
>> +}
>
> If you do not support notifications you should probably return an error
> here.
> NT_STATUS_NOT_IMPLEMENTED I would think, in this case.

Done.

>> +static int vfs_gluster_get_real_filename(struct vfs_handle_struct
>> *handle,
>> +                     const char *path, const char *name,
>> +                     TALLOC_CTX *mem_ctx, char **found_name)
>> +{
>> +    int ret;
>> +    char key_buf[NAME_MAX + 64];
>> +    char val_buf[NAME_MAX + 1];
>> +
>> +    if (strlen(name) >= NAME_MAX) {
>> +        errno = ENAMETOOLONG;
>> +        return -1;
>> +    }
>> +
>> +    snprintf(key_buf, NAME_MAX + 64,
>> +         "trusted.glusterfs.get_real_filename:%s", name);
>
> Are you sure you can always access 'trusted' attributes ?
> IIRC trusted can only be accessed by root or a process with equivalent
> capabilities, while samba performs file operations as the user, does
> gluster do no checking on who is accessing the attributes ?
> If gluster doesn't then a lot of the code in this module neeeds a ton of
> access control applied on top.

Ah, changed to "user." namespace. I have been testing as root and the 
issue hadn't shown up (yet).

>> +static SMB_ACL_T vfs_gluster_sys_acl_get_file(struct
>> vfs_handle_struct *handle,
>> +                          const char *path_p,
>> +                          SMB_ACL_TYPE_T type,
>> +                          TALLOC_CTX *mem_ctx)
>> +{
>> +    struct smb_acl_t *result;
>> +    acl_t acl;
>> +    char *buf;
>> +    char *key;
>> +    ssize_t ret;
>> +
>> +    switch (type) {
>> +    case SMB_ACL_TYPE_ACCESS:
>> +        key = "system.posix_acl_access";
>> +        break;
>> +    case SMB_ACL_TYPE_DEFAULT:
>> +        key = "system.posix_acl_default";
>> +        break;
>> +    default:
>> +        errno = EINVAL;
>> +        return NULL;
>> +    }
>> +
>> +    ret = glfs_getxattr(handle->data, path_p, key, 0, 0);
>> +    if (ret <= 0)
>> +        return NULL;
>> +
>> +    buf = alloca(ret);
>> +    ret = glfs_getxattr(handle->data, path_p, key, buf, ret);
>> +    if (ret <= 0)
>> +        return NULL;
>> +
>> +    acl = acl_copy_int(buf);
>> +
>> +    if (!acl)
>> +        return NULL;
>> +
>> +    result = smb_acl_to_internal(acl, mem_ctx);
>> +    acl_free(acl);
>> +    return result;
>> +}
>
> I see that here you are calling acl_copy_int() first and then
> smb_acl_to_internal() that you exposed in the previous patch.
>
> I was wondering, instead of doing this in master maybe it makes sense to
> add a new vfs call like
>
> vfs_linux_acl_get_file() that returns 'buf', and then let internal samba
> code perform acl_copy_int()+smb_acl_to_internal()
> This way we do not expose this internal function and let the details on
> how to manage data suitable for passing to the linux's acl library
> inside the core ?
>
> Same for the other similar calls.
>
> Jeremy,
> what do you think ?

Gluster's ACL is byte-compatible with Linux's POSIX ACL format. The use 
of acl_copy_int() here is possibly dangerous on non-Linux servers 
(though in reality most of the OS's POSIX ACL format is byte 
compatible?). In that sense, maybe it makes sense to have a Gluster 
format to smb_acl_t converter implemented natively in this module 
(though it might look similar to smb_acl_to_internal).

This would settle the debate of exposing other module internals too. 
Thoughts?

> The patchset looks mostly ok to me, and I only found nitpicks so far. Of
> course I haven't tested it, so before it gets in someone needs to be
> able to do at least some basic testing.

We have been doing testing of this in Red Hat Storage QE. Issues 
uncovered are mostly in the new gluster gfapi and nothing really in this 
module.

> Also I am a bit concerned about how access control is performed when
> using the glusterfs library directly.
> Can you shed some light about that please ?
>

The Samba connects to gluster bricks as a trusted client (running in the 
same trusted hosts as the bricks, verified by host authentication by 
IP). This means the library performs getpid() etc. and presents those 
values as the identity in the RPC request. The gluster bricks perform 
access control by trusting the presented identity.

Running Samba outside the gluster trusted storage pool requires some 
more auth work, and we do not plan to support that yet.

Avati