vfs_commit.c feature patch: sync at eof (before file close)
James Peach
jpeach at samba.org
Fri Nov 9 21:36:05 GMT 2007
On Nov 9, 2007, at 11:58 AM, SER.RI-TIC - David Losada wrote:
> Hi James,
>
> thanks for the reviewed patch. I think the names for the modes weren't
> very good... maybe first let's review the modes as I originally laid
> them:
>
> * predict: if client sends file size info, commit whenever the client
> writes at that file position or beyond it. If no info, never commit.
> * strict: commit whenever the size of the file actually grows.
> *Ignore*
> file size hinted by the client.
> * combined: commit whenever the size of the file grows, but refrain to
> do it until it has reached the file size hinted by the client.
>
> In the case of Explorer copies, "strict" was different from "predict":
> it would commit *much* more often (it ignored the file size hint)
>
> Looking back, I think "strict" is useless, because it's not really
> better than "combined".
>
> Your patch is equivalent to the "combined mode", taking out the
> other 2
> modes.
>
> "Combined mode" is nice, it uses the file size declared by client, if
> available. But it also has it's downside: it commits every time the
> file
> grows. In some cases people could find very slow transfers.
>
> An example is custom applications that write ASCII files line by line.
> We have some applications like that here, and syncing after every
> write
> results in a transfer of about 2 KB/sec :) I may need to be that
> careful, because of our NFS quotas problem, but some people may not.
> So
> I think there was a reason to have a "predict" mode.
>
> I propose to offer two modes, and name them better: the default
> would be
> "hinted" (~ "predict") and "growth" (~ "combined"). The attached patch
> (against 3.0.26a) follows the philosophy of your patch, modifying it
> slightly.
Sounds good. I'll review and commit this sometime in the next few days.
>
>
> About backporting to RHEL4/5.. I will check how cleanly those svn
> patches can be applied.
>
> kind regards,
>
> En/na James Peach ha escrit:
>> On 31/10/2007, at 3:35 PM, <david.losada at urv.cat>
>> <david.losada at urv.cat> wrote:
>>
>>> Hello,
>>
>> Hi David,
>>
>> thanks for the patch.
>>
>>> here I'm attaching a patch for vfs_commit.c, against 3.0.26a
>>> release.
>>> It provides the following changes:
>>>
>>> 1. check errors after fsync
>>> Errors can arise due to fsync (or fdatasync) system calls. The
>>> current version of the module doesn't check for them, the patch
>>> addresses this.
>>>
>>> The exception is syncing before closing the file, errors are not
>>> checked then. My interpretation is that if errors are ignored when
>>> syncing, those errors will raise again when closing the file.
>>>
>>> 2. new optional feature: syncing at end of file
>>> This feature tries to address some woes we were suffering in our
>>> setup, with Samba over NFS quotas, (and Windows clients)
>>>
>>> The aim is to sync the file before the client decides to close it,
>>> so
>>> quota errors are reported at the right time for Windows clients.
>>> But,
>>> should sync as little as possible, so less performance is lost. The
>>> module allows the user to select different strategies:
>>>
>>> - predict (default): when copying files from the Windows explorer
>>> shell, the client declares how big is the file that is going to be
>>> copied. This translates into an ftruncate VFS call. This way, the
>>> module can tell when the last byte of file has been written, and
>>> sync
>>> immediately after.
>>>
>>> - strict: this option works when the user has set "strict allocate =
>>> yes". In this strategy, the module syncs whenever a write operation
>>> has resulted in an increased file size.
>>
>> So predict and strict aren't really very different. In fact, for
>> Explorer copies, strict mode will effectively become predict mode.
>>
>> I simplified and combined the modes to always track EOF and I think
>> the resulting behaviour should be equivalent.
>>
>> Could you please review the attached patch?
>>
>> ------------------------------------------------------------------------
>>
>>
>>
>>
>>
>>> - combined: if the client doesn't declare a file size, behavior is
>>> the same as "strict". If a file size is declared, behaves like
>>> "predict" I have tested the patch with 3.0.26a, and as far as I can
>>> tell it behaves as expected. Unfortunately, I can't add this module
>>> to the RHEL4 and RHEL5 setups we have at work, due to the changes in
>>> the VFS API... I'm missing the ability to add extensions to the
>>> files_struct . So now I'm looking how to backport it to our needs..
>>> any guides for this task?
>>
>>
>> You'd have to backport the extensions API and the corresponding
>> changes to the VFS layer as well. IIRC there's 2 or 3 svn revisions
>> that add what you need.
>>
>>
>>
>> --
>>
>> James Peach | jpeach at samba.org
>>
>>
>>
>>
>>
>>
>
> diff -r -U 10 a/source/modules/vfs_commit.c b/source/modules/
> vfs_commit.c
> --- a/source/modules/vfs_commit.c 2007-03-01 05:54:33.000000000 +0100
> +++ b/source/modules/vfs_commit.c 2007-11-09 20:18:08.000000000 +0100
> @@ -1,12 +1,13 @@
> /*
> - * Copyright (c) James Peach 2006
> + * Copyright (c) James Peach 2006,2007
> + * Copyright (c) David Losada Carballo 2007
> *
> * This program is free software; you can redistribute it and/or
> modify
> * it under the terms of the GNU General Public License as published
> by
> * the Free Software Foundation; either version 2 of the License, or
> * (at your option) any later version.
> *
> * This program is distributed in the hope that it will be useful,
> * but WITHOUT ANY WARRANTY; without even the implied warranty of
> * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> * GNU General Public License for more details.
> @@ -23,168 +24,309 @@
> * The purpose of this module is to flush data to disk at regular
> intervals,
> * just like the NFS commit operation. There's two rationales for
> this. First,
> * it minimises the data loss in case of a power outage without
> incurring
> * the poor performance of synchronous I/O. Second, a steady flush
> rate
> * can produce better throughput than suddenly dumping massive
> amounts of
> * writes onto a disk.
> *
> * Tunables:
> *
> * commit: dthresh Amount of dirty data that can accumulate
> - * before we commit (sync) it.
> + * before we commit (sync) it.
> *
> * commit: debug Debug level at which to emit messages.
> *
> + * commit: on eof Commit after writing what could be the
> last bytes
> + * of the file. This option doesn't behave
> as expected
> + * when having write caches enabled.
> Default = no.
> + *
> + * commit: eof mode String. Tunes how the module tries to
> guess when
> + * the client has written the last bytes
> of the file.
> + * Possible values (default = hinted):
> + *
> + * (*) = hinted Some clients (i.e. Windows Explorer)
> declare the
> + * size of the file before transferring
> it. With this
> + * option, we remember that hint, and
> commit after
> + * writing in that file position. If the
> client
> + * doesn't declare the size of file,
> commiting on EOF
> + * is not triggered.
> + *
> + * = growth ONLY in shares with "strict allocate"
> enabled. It
> + * commits after a write operation has
> made the file
> + * size grow. If the client declares a
> file size, it
> + * refrains to commit until the file has
> reached it.
> + * Useful for assuring quotas in NFS-based
> shares.
> + *
> */
>
> #define MODULE "commit"
>
> static int module_debug;
>
> +enum eof_mode
> +{
> + EOF_NONE = 0x0000,
> + EOF_HINTED = 0x0001,
> + EOF_GROWTH = 0x0002
> +};
> +
> struct commit_info
> {
> + /* For chunk-based commits */
> SMB_OFF_T dbytes; /* Dirty (uncommitted) bytes */
> SMB_OFF_T dthresh; /* Dirty data threshold */
> + /* For commits on EOF */
> + enum eof_mode on_eof;
> + SMB_OFF_T eof; /* Expected file size */
> };
>
> -static void commit_all(
> +static int commit_do(
> + struct commit_info * c,
> + int fd)
> +{
> + int result;
> +
> + result = fdatasync(fd);
> + if (result == 0) {
> + c->dbytes = 0; /* on success, no dirty bytes */
> + }
> + return result;
> +}
> +
> +static int commit_all(
> struct vfs_handle_struct * handle,
> files_struct * fsp)
> {
> struct commit_info *c;
>
> if ((c = VFS_FETCH_FSP_EXTENSION(handle, fsp))) {
> if (c->dbytes) {
> DEBUG(module_debug,
> ("%s: flushing %lu dirty bytes\n",
> MODULE, (unsigned long)c->dbytes));
>
> - fdatasync(fsp->fh->fd);
> - c->dbytes = 0;
> + return commit_do(c, fsp->fh->fd);
> }
> }
> + return 0;
> }
>
> -static void commit(
> +static int commit_logic(
> struct vfs_handle_struct * handle,
> files_struct * fsp,
> + SMB_OFF_T offset,
> ssize_t last_write)
> {
> struct commit_info *c;
>
> - if ((c = VFS_FETCH_FSP_EXTENSION(handle, fsp))) {
> -
> - if (last_write > 0) {
> - c->dbytes += last_write;
> - }
> + if ((c = VFS_FETCH_FSP_EXTENSION(handle, fsp)) == NULL) {
> + return 0;
> + }
> +
> + c->dbytes += last_write; /* dirty bytes always counted */
> +
> + if (c->dthresh && (c->dbytes > c->dthresh)) {
> + DEBUG(module_debug,
> + ("%s: flushing %lu dirty bytes\n",
> + MODULE, (unsigned long)c->dbytes));
> +
> + return commit_do(c, fsp->fh->fd);
> + }
> +
> + /* If we don't try to commit on eof OR we don't have the needed
> + * client hints for it, return */
> + if ((c->on_eof == EOF_NONE) ||
> + (c->on_eof == EOF_HINTED && c->eof == 0)) {
> + return 0;
> + }
> +
> + /* This write hit or went past our current idea of the file size. */
> + if (offset + last_write >= c->eof) {
> + if (commit_do(c, fsp->fh->fd) == -1) {
> + return -1;
> + }
> +
> + /* In case we track file size, adjust EOF upwards */
> + if (c->on_eof == EOF_GROWTH) {
> + c->eof = offset + last_write;
> + }
> + }
>
> - if (c->dbytes > c->dthresh) {
> - DEBUG(module_debug,
> - ("%s: flushing %lu dirty bytes\n",
> - MODULE, (unsigned long)c->dbytes));
> -
> - fdatasync(fsp->fh->fd);
> - c->dbytes = 0;
> - }
> - }
> + return 0;
> }
>
> static int commit_connect(
> struct vfs_handle_struct * handle,
> const char * service,
> const char * user)
> {
> module_debug = lp_parm_int(SNUM(handle->conn), MODULE,
> "debug", 100);
> return SMB_VFS_NEXT_CONNECT(handle, service, user);
> }
>
> static int commit_open(
> vfs_handle_struct * handle,
> const char * fname,
> files_struct * fsp,
> int flags,
> mode_t mode)
> {
> SMB_OFF_T dthresh;
> + bool on_eof;
> + const char *eof_mode;
> + bool strict_alloc;
> + struct commit_info *c = NULL;
> + int fd;
>
> /* Don't bother with read-only files. */
> if ((flags & O_ACCMODE) == O_RDONLY) {
> return SMB_VFS_NEXT_OPEN(handle, fname, fsp, flags,
> mode);
> }
>
> + /* Read and check module configuration */
> dthresh = conv_str_size(lp_parm_const_string(SNUM(handle-
> >conn),
> MODULE, "dthresh", NULL));
> + on_eof = lp_parm_bool(SNUM(handle->conn), MODULE, "on eof",
> False);
> + eof_mode = lp_parm_const_string(SNUM(handle->conn),
> + MODULE, "eof mode",
> "predict");
> + strict_alloc = lp_strict_allocate(SNUM(fsp->conn));
>
> - if (dthresh > 0) {
> - struct commit_info * c;
> + if (dthresh > 0 || on_eof) {
> c = VFS_ADD_FSP_EXTENSION(handle, fsp, struct
> commit_info);
> + /* Process main tunables */
> if (c) {
> c->dthresh = dthresh;
> c->dbytes = 0;
> + c->on_eof = EOF_NONE;
> + c->eof = 0;
> + }
> + }
> + /* Process eof_mode tunable */
> + if (c && on_eof) {
> + c->on_eof = EOF_HINTED; /* default mode */
> + if (!StrCaseCmp(eof_mode, "growth")) {
> + c->on_eof = EOF_GROWTH;
> + }
> +
> + if ((c->on_eof & EOF_GROWTH) && !strict_alloc) {
> + DEBUG(module_debug,
> + ("downgrading growth to hinted EOF mode because "
> + "strict alloc is disabled\n"));
> + c->on_eof = EOF_HINTED;
> }
> }
>
> - return SMB_VFS_NEXT_OPEN(handle, fname, fsp, flags, mode);
> + fd = SMB_VFS_NEXT_OPEN(handle, fname, fsp, flags, mode);
> + if (fd == -1) {
> + VFS_REMOVE_FSP_EXTENSION(handle, fsp);
> + return fd;
> + }
> +
> + /* Do commit modes require us to do the fstat()? */
> + if (c && (c->on_eof != EOF_NONE)) {
> + SMB_STRUCT_STAT st;
> + if (SMB_VFS_FSTAT(fsp, fd, &st) == -1) {
> + return -1;
> + }
> + c->eof = st.st_size;
> + }
> +
> + return 0;
> }
>
> static ssize_t commit_write(
> vfs_handle_struct * handle,
> files_struct * fsp,
> int fd,
> void * data,
> size_t count)
> {
> ssize_t ret;
>
> ret = SMB_VFS_NEXT_WRITE(handle, fsp, fd, data, count);
> - commit(handle, fsp, ret);
> +
> + if (ret > 0) { /* on write success */
> + /* Do commit logic, checking errors that can arise */
> + if (commit_logic(handle, fsp, fsp->fh->pos, ret) ==
> -1) {
> + return -1;
> + }
> + }
>
> return ret;
> }
>
> static ssize_t commit_pwrite(
> vfs_handle_struct * handle,
> files_struct * fsp,
> int fd,
> void * data,
> size_t count,
> SMB_OFF_T offset)
> {
> ssize_t ret;
>
> ret = SMB_VFS_NEXT_PWRITE(handle, fsp, fd, data, count,
> offset);
> - commit(handle, fsp, ret);
> +
> + if (ret > 0) { /* on write success */
> + /* Do commit logic, checking errors that can arise */
> + if (commit_logic(handle, fsp, offset, ret) == -1) {
> + return -1;
> + }
> + }
>
> return ret;
> }
>
> -static ssize_t commit_close(
> +static int commit_close(
> vfs_handle_struct * handle,
> files_struct * fsp,
> int fd)
> {
> + /* Commit errors not checked, close() will find them again */
> commit_all(handle, fsp);
> return SMB_VFS_NEXT_CLOSE(handle, fsp, fd);
> }
>
> +static int commit_ftruncate(
> + vfs_handle_struct * handle,
> + files_struct * fsp,
> + int fd,
> + SMB_OFF_T len)
> +{
> + int result;
> +
> + result = SMB_VFS_NEXT_FTRUNCATE(handle, fsp, fd, len);
> + if (result == 0) {
> + struct commit_info *c;
> + if ((c = VFS_FETCH_FSP_EXTENSION(handle, fsp))) {
> + commit_logic(handle, fsp, len, 0);
> + c->eof = len;
> + }
> + }
> +
> + return result;
> +}
> +
> static vfs_op_tuple commit_ops [] =
> {
> {SMB_VFS_OP(commit_open),
> SMB_VFS_OP_OPEN, SMB_VFS_LAYER_TRANSPARENT},
> {SMB_VFS_OP(commit_close),
> SMB_VFS_OP_CLOSE, SMB_VFS_LAYER_TRANSPARENT},
> {SMB_VFS_OP(commit_write),
> SMB_VFS_OP_WRITE, SMB_VFS_LAYER_TRANSPARENT},
> {SMB_VFS_OP(commit_pwrite),
> SMB_VFS_OP_PWRITE, SMB_VFS_LAYER_TRANSPARENT},
> {SMB_VFS_OP(commit_connect),
> SMB_VFS_OP_CONNECT, SMB_VFS_LAYER_TRANSPARENT},
> + {SMB_VFS_OP(commit_ftruncate),
> + SMB_VFS_OP_FTRUNCATE, SMB_VFS_LAYER_TRANSPARENT},
>
> {SMB_VFS_OP(NULL), SMB_VFS_OP_NOOP, SMB_VFS_LAYER_NOOP}
> };
>
> NTSTATUS vfs_commit_init(void);
> NTSTATUS vfs_commit_init(void)
> {
> return smb_register_vfs(SMB_VFS_INTERFACE_VERSION, MODULE,
> commit_ops);
> }
>
> <david.losada.vcf>
More information about the samba-technical
mailing list