vfs_commit.c feature patch: sync at eof (before file close)

James Peach jpeach at samba.org
Fri Nov 9 21:36:05 GMT 2007


On Nov 9, 2007, at 11:58 AM, SER.RI-TIC - David Losada wrote:

> Hi James,
>
> thanks for the reviewed patch. I think the names for the modes weren't
> very good... maybe first let's review the modes as I originally laid  
> them:
>
> * predict: if client sends file size info, commit whenever the client
> writes at that file position or beyond it. If no info, never commit.
> * strict: commit whenever the size of the file actually grows.  
> *Ignore*
> file size hinted by the client.
> * combined: commit whenever the size of the file grows, but refrain to
> do it until it has reached the file size hinted by the client.
>
> In the case of Explorer copies, "strict" was different from "predict":
> it would commit *much* more often (it ignored the file size hint)
>
> Looking back, I think "strict" is useless, because it's not really
> better than "combined".
>
> Your patch is equivalent to the "combined mode", taking out the  
> other 2
> modes.
>
> "Combined mode" is nice, it uses the file size declared by client, if
> available. But it also has it's downside: it commits every time the  
> file
> grows. In some cases people could find very slow transfers.
>
> An example is custom applications that write ASCII files line by line.
> We have some applications like that here, and syncing after every  
> write
> results in a transfer of about 2 KB/sec :) I may need to be that
> careful, because of our NFS quotas problem, but some people may not.  
> So
> I think there was a reason to have a "predict" mode.
>
> I propose to offer two modes, and name them better: the default  
> would be
> "hinted" (~ "predict") and "growth" (~ "combined"). The attached patch
> (against 3.0.26a) follows the philosophy of your patch, modifying it
> slightly.

Sounds good. I'll review and commit this sometime in the next few days.

>
>
> About backporting to RHEL4/5.. I will check how cleanly those svn
> patches can be applied.
>
> kind regards,
>
> En/na James Peach ha escrit:
>> On 31/10/2007, at 3:35 PM, <david.losada at urv.cat>
>> <david.losada at urv.cat> wrote:
>>
>>> Hello,
>>
>> Hi David,
>>
>> thanks for the patch.
>>
>>> here I'm attaching a patch for vfs_commit.c, against 3.0.26a  
>>> release.
>>> It provides the following changes:
>>>
>>> 1. check errors after fsync
>>> Errors can arise due to fsync (or fdatasync) system calls. The
>>> current version of the module doesn't check for them, the patch
>>> addresses this.
>>>
>>> The exception is syncing before closing the file, errors are not
>>> checked then. My interpretation is that if errors are ignored when
>>> syncing, those errors will raise again when closing the file.
>>>
>>> 2. new optional feature: syncing at end of file
>>> This feature tries to address some woes we were suffering in our
>>> setup, with Samba over NFS quotas, (and Windows clients)
>>>
>>> The aim is to sync the file before the client decides to close it,  
>>> so
>>> quota errors are reported at the right time for Windows clients.  
>>> But,
>>> should sync as little as possible, so less performance is lost. The
>>> module allows the user to select different strategies:
>>>
>>> - predict (default): when copying files from the Windows explorer
>>> shell, the client declares how big is the file that is going to be
>>> copied. This translates into an ftruncate VFS call. This way, the
>>> module can tell when the last byte of file has been written, and  
>>> sync
>>> immediately after.
>>>
>>> - strict: this option works when the user has set "strict allocate =
>>> yes". In this strategy, the module syncs whenever a write operation
>>> has resulted in an increased file size.
>>
>> So predict and strict aren't really very different. In fact, for
>> Explorer copies, strict mode will effectively become predict mode.
>>
>> I simplified and combined the modes to always track EOF and I think
>> the resulting behaviour should be equivalent.
>>
>> Could you please review the attached patch?
>>
>> ------------------------------------------------------------------------
>>
>>
>>
>>
>>
>>> - combined: if the client doesn't declare a file size, behavior is
>>> the same as "strict". If a file size is declared, behaves like
>>> "predict" I have tested the patch with 3.0.26a, and as far as I can
>>> tell it behaves as expected. Unfortunately, I can't add this module
>>> to the RHEL4 and RHEL5 setups we have at work, due to the changes in
>>> the VFS API... I'm missing the ability to add extensions to the
>>> files_struct . So now I'm looking how to backport it to our needs..
>>> any guides for this task?
>>
>>
>> You'd have to backport the extensions API and the corresponding  
>> changes to the VFS layer as well. IIRC there's 2 or 3 svn revisions  
>> that add what you need.
>>
>>
>>
>> --
>>
>> James Peach | jpeach at samba.org
>>
>>
>>
>>
>>
>>
>
> diff -r -U 10 a/source/modules/vfs_commit.c b/source/modules/ 
> vfs_commit.c
> --- a/source/modules/vfs_commit.c	2007-03-01 05:54:33.000000000 +0100
> +++ b/source/modules/vfs_commit.c	2007-11-09 20:18:08.000000000 +0100
> @@ -1,12 +1,13 @@
> /*
> - * Copyright (c) James Peach 2006
> + * Copyright (c) James Peach 2006,2007
> + * Copyright (c) David Losada Carballo 2007
>  *
>  * This program is free software; you can redistribute it and/or  
> modify
>  * it under the terms of the GNU General Public License as published  
> by
>  * the Free Software Foundation; either version 2 of the License, or
>  * (at your option) any later version.
>  *
>  * This program is distributed in the hope that it will be useful,
>  * but WITHOUT ANY WARRANTY; without even the implied warranty of
>  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>  * GNU General Public License for more details.
> @@ -23,168 +24,309 @@
>  * The purpose of this module is to flush data to disk at regular  
> intervals,
>  * just like the NFS commit operation. There's two rationales for  
> this. First,
>  * it minimises the data loss in case of a power outage without  
> incurring
>  * the poor performance of synchronous I/O. Second, a steady flush  
> rate
>  * can produce better throughput than suddenly dumping massive  
> amounts of
>  * writes onto a disk.
>  *
>  * Tunables:
>  *
>  *  commit: dthresh         Amount of dirty data that can accumulate
> - *			                before we commit (sync) it.
> + *                          before we commit (sync) it.
>  *
>  *  commit: debug           Debug level at which to emit messages.
>  *
> + *  commit: on eof          Commit after writing what could be the  
> last bytes
> + *                          of the file. This option doesn't behave  
> as expected
> + *                          when having write caches enabled.  
> Default = no.
> + *
> + *  commit: eof mode        String. Tunes how the module tries to  
> guess when
> + *                          the client has written the last bytes  
> of the file.
> + *                          Possible values (default = hinted):
> + *
> + *     (*)  = hinted        Some clients (i.e. Windows Explorer)  
> declare the
> + *                          size of the file before transferring  
> it. With this
> + *                          option, we remember that hint, and  
> commit after
> + *                          writing in that file position. If the  
> client
> + *                          doesn't declare the size of file,  
> commiting on EOF
> + *                          is not triggered.
> + *
> + *          = growth        ONLY in shares with "strict allocate"  
> enabled. It
> + *                          commits after a write operation has  
> made the file
> + *                          size grow. If the client declares a  
> file size, it
> + *                          refrains to commit until the file has  
> reached it.
> + *                          Useful for assuring quotas in NFS-based  
> shares.
> + *
>  */
>
> #define MODULE "commit"
>
> static int module_debug;
>
> +enum eof_mode
> +{
> +    EOF_NONE = 0x0000,
> +    EOF_HINTED = 0x0001,
> +    EOF_GROWTH = 0x0002
> +};
> +
> struct commit_info
> {
> +        /* For chunk-based commits */
>         SMB_OFF_T dbytes;	/* Dirty (uncommitted) bytes */
>         SMB_OFF_T dthresh;	/* Dirty data threshold */
> +        /* For commits on EOF */
> +	enum eof_mode on_eof;
> +        SMB_OFF_T eof;     /* Expected file size */
> };
>
> -static void commit_all(
> +static int commit_do(
> +        struct commit_info *            c,
> +        int                             fd)
> +{
> +        int result;
> +
> +        result = fdatasync(fd);
> +        if (result == 0) {
> +                c->dbytes = 0;	/* on success, no dirty bytes */
> +        }
> +        return result;
> +}
> +
> +static int commit_all(
>         struct vfs_handle_struct *	handle,
>         files_struct *		        fsp)
> {
>         struct commit_info *c;
>
>         if ((c = VFS_FETCH_FSP_EXTENSION(handle, fsp))) {
>                 if (c->dbytes) {
>                         DEBUG(module_debug,
>                                 ("%s: flushing %lu dirty bytes\n",
>                                  MODULE, (unsigned long)c->dbytes));
>
> -                        fdatasync(fsp->fh->fd);
> -                        c->dbytes = 0;
> +                        return commit_do(c, fsp->fh->fd);
>                 }
>         }
> +        return 0;
> }
>
> -static void commit(
> +static int commit_logic(
>         struct vfs_handle_struct *	handle,
>         files_struct *		        fsp,
> +	SMB_OFF_T			offset,
>         ssize_t			        last_write)
> {
>         struct commit_info *c;
>
> -        if ((c = VFS_FETCH_FSP_EXTENSION(handle, fsp))) {
> -
> -                if (last_write > 0) {
> -                        c->dbytes += last_write;
> -                }
> +        if ((c = VFS_FETCH_FSP_EXTENSION(handle, fsp)) == NULL) {
> +		return 0;
> +	}
> +
> +	c->dbytes += last_write;	/* dirty bytes always counted */
> +
> +	if (c->dthresh && (c->dbytes > c->dthresh)) {
> +		DEBUG(module_debug,
> +			("%s: flushing %lu dirty bytes\n",
> +			 MODULE, (unsigned long)c->dbytes));
> +
> +		return commit_do(c, fsp->fh->fd);
> +	}
> +
> +	/* If we don't try to commit on eof OR we don't have the needed
> +	 * client hints for it, return */
> +	if ((c->on_eof == EOF_NONE) ||
> +	    (c->on_eof == EOF_HINTED && c->eof == 0)) {	
> +		return 0;
> +	}
> +
> +	/* This write hit or went past our current idea of the file size. */
> +	if (offset + last_write >= c->eof) {
> +		if (commit_do(c, fsp->fh->fd) == -1) {
> +			return -1;
> +		}
> +
> +		/* In case we track file size, adjust EOF upwards */
> +		if (c->on_eof == EOF_GROWTH) {
> +			c->eof = offset + last_write;
> +		}
> +	}
>
> -                if (c->dbytes > c->dthresh) {
> -                        DEBUG(module_debug,
> -                                ("%s: flushing %lu dirty bytes\n",
> -                                 MODULE, (unsigned long)c->dbytes));
> -
> -                        fdatasync(fsp->fh->fd);
> -                        c->dbytes = 0;
> -                }
> -        }
> +        return 0;
> }
>
> static int commit_connect(
>         struct vfs_handle_struct *  handle,
>         const char *                service,
>         const char *                user)
> {
>         module_debug = lp_parm_int(SNUM(handle->conn), MODULE,  
> "debug", 100);
>         return SMB_VFS_NEXT_CONNECT(handle, service, user);
> }
>
> static int commit_open(
> 	vfs_handle_struct * handle,
> 	const char *	    fname,
> 	files_struct *	    fsp,
> 	int		    flags,
> 	mode_t		    mode)
> {
>         SMB_OFF_T dthresh;
> +        bool on_eof;
> +        const char *eof_mode;
> +        bool strict_alloc;
> +        struct commit_info *c = NULL;
> +	int fd;
>
>         /* Don't bother with read-only files. */
>         if ((flags & O_ACCMODE) == O_RDONLY) {
>                 return SMB_VFS_NEXT_OPEN(handle, fname, fsp, flags,  
> mode);
>         }
>
> +        /* Read and check module configuration */
>         dthresh = conv_str_size(lp_parm_const_string(SNUM(handle- 
> >conn),
>                                         MODULE, "dthresh", NULL));
> +        on_eof = lp_parm_bool(SNUM(handle->conn), MODULE, "on eof",  
> False);
> +        eof_mode = lp_parm_const_string(SNUM(handle->conn),
> +                                        MODULE, "eof mode",  
> "predict");
> +        strict_alloc = lp_strict_allocate(SNUM(fsp->conn));
>
> -        if (dthresh > 0) {
> -                struct commit_info * c;
> +        if (dthresh > 0 || on_eof) {
>                 c = VFS_ADD_FSP_EXTENSION(handle, fsp, struct  
> commit_info);
> +                /* Process main tunables */
>                 if (c) {
>                         c->dthresh = dthresh;
>                         c->dbytes = 0;
> +                        c->on_eof = EOF_NONE;
> +                        c->eof = 0;
> +                }
> +        }
> +        /* Process eof_mode tunable */
> +        if (c && on_eof) {
> +		c->on_eof = EOF_HINTED;	/* default mode */
> +                if (!StrCaseCmp(eof_mode, "growth")) {
> +                        c->on_eof = EOF_GROWTH;
> +                }
> +
> +                if ((c->on_eof & EOF_GROWTH) && !strict_alloc) {
> +			DEBUG(module_debug,
> +			    ("downgrading growth to hinted EOF mode because "
> +			     "strict alloc is disabled\n"));
> +                        c->on_eof = EOF_HINTED;
>                 }
>         }
>
> -        return SMB_VFS_NEXT_OPEN(handle, fname, fsp, flags, mode);
> +        fd = SMB_VFS_NEXT_OPEN(handle, fname, fsp, flags, mode);
> +	if (fd == -1) {
> +		VFS_REMOVE_FSP_EXTENSION(handle, fsp);
> +		return fd;
> +	}
> +
> +        /* Do commit modes require us to do the fstat()? */
> +        if (c && (c->on_eof != EOF_NONE)) {
> +                SMB_STRUCT_STAT st;
> +                if (SMB_VFS_FSTAT(fsp, fd, &st) == -1) {
> +                        return -1;
> +                }
> +		c->eof = st.st_size;
> +        }
> +
> +        return 0;
> }
>
> static ssize_t commit_write(
>         vfs_handle_struct * handle,
>         files_struct *      fsp,
>         int                 fd,
>         void *              data,
>         size_t              count)
> {
>         ssize_t ret;
>
>         ret = SMB_VFS_NEXT_WRITE(handle, fsp, fd, data, count);
> -        commit(handle, fsp, ret);
> +
> +        if (ret > 0) {	/* on write success */
> +                /* Do commit logic, checking errors that can arise */
> +                if (commit_logic(handle, fsp, fsp->fh->pos, ret) ==  
> -1) {
> +                        return -1;
> +                }
> +        }
>
>         return ret;
> }
>
> static ssize_t commit_pwrite(
>         vfs_handle_struct * handle,
>         files_struct *      fsp,
>         int                 fd,
>         void *              data,
>         size_t              count,
> 	SMB_OFF_T	    offset)
> {
>         ssize_t ret;
>
>         ret = SMB_VFS_NEXT_PWRITE(handle, fsp, fd, data, count,  
> offset);
> -        commit(handle, fsp, ret);
> +
> +        if (ret > 0) {	/* on write success */
> +                /* Do commit logic, checking errors that can arise */
> +                if (commit_logic(handle, fsp, offset, ret) == -1) {
> +                        return -1;
> +                }
> +        }
>
>         return ret;
> }
>
> -static ssize_t commit_close(
> +static int commit_close(
>         vfs_handle_struct * handle,
>         files_struct *      fsp,
>         int                 fd)
> {
> +        /* Commit errors not checked, close() will find them again */
>         commit_all(handle, fsp);
>         return SMB_VFS_NEXT_CLOSE(handle, fsp, fd);
> }
>
> +static int commit_ftruncate(
> +        vfs_handle_struct * handle,
> +        files_struct *      fsp,
> +        int                 fd,
> +        SMB_OFF_T           len)
> +{
> +        int result;
> +
> +        result = SMB_VFS_NEXT_FTRUNCATE(handle, fsp, fd, len);
> +        if (result == 0) {
> +		struct commit_info *c;
> +		if ((c = VFS_FETCH_FSP_EXTENSION(handle, fsp))) {
> +			commit_logic(handle, fsp, len, 0);
> +			c->eof = len;
> +		}
> +        }
> +
> +        return result;
> +}
> +
> static vfs_op_tuple commit_ops [] =
> {
>         {SMB_VFS_OP(commit_open),
>                 SMB_VFS_OP_OPEN, SMB_VFS_LAYER_TRANSPARENT},
>         {SMB_VFS_OP(commit_close),
>                 SMB_VFS_OP_CLOSE, SMB_VFS_LAYER_TRANSPARENT},
>         {SMB_VFS_OP(commit_write),
>                 SMB_VFS_OP_WRITE, SMB_VFS_LAYER_TRANSPARENT},
>         {SMB_VFS_OP(commit_pwrite),
>                 SMB_VFS_OP_PWRITE, SMB_VFS_LAYER_TRANSPARENT},
>         {SMB_VFS_OP(commit_connect),
>                 SMB_VFS_OP_CONNECT,  SMB_VFS_LAYER_TRANSPARENT},
> +        {SMB_VFS_OP(commit_ftruncate),
> +                SMB_VFS_OP_FTRUNCATE,  SMB_VFS_LAYER_TRANSPARENT},
>
>         {SMB_VFS_OP(NULL), SMB_VFS_OP_NOOP, SMB_VFS_LAYER_NOOP}
> };
>
> NTSTATUS vfs_commit_init(void);
> NTSTATUS vfs_commit_init(void)
> {
> 	return smb_register_vfs(SMB_VFS_INTERFACE_VERSION, MODULE,  
> commit_ops);
> }
>
> <david.losada.vcf>



More information about the samba-technical mailing list