Bogus rsync "Success" message when out of disk space

John Van Essen vanes002 at UMN.EDU
Sun Apr 27 06:09:29 EST 2003


Patches welcome, eh, Paul?

Upon further (belated) investigation, there are 2 affected places
in receiver.c with this error message.  Both call write_file().
And write_file is called only in those two places.  So that is the
appropriate location to patch.  Especially since the obvious fix is
to use the rewrite code already there for the sparse file writes.

-------------------------------------8<-------------------------------------
--- fileio.c.orig       Fri Jan 25 15:07:34 2002
+++ fileio.c    Sat Apr 26 12:16:25 2003
@@ -69,25 +69,28 @@
        return len;
 }
 
 
 
 int write_file(int f,char *buf,size_t len)
 {
        int ret = 0;
 
-       if (!sparse_files) {
-               return write(f,buf,len);
-       }
-
        while (len>0) {
-               int len1 = MIN(len, SPARSE_WRITE_SIZE);
-               int r1 = write_sparse(f, buf, len1);
+               int r1;
+               if (sparse_files) {
+                       int len1 = MIN(len, SPARSE_WRITE_SIZE);
+                       r1 = write_sparse(f, buf, len1);
+               } else {
+                       /* Normal writes are in this loop, too, so that */
+                       /*   retries of partial writes will set errno.  */
+                       r1 = write(f, buf, len);
+               }
                if (r1 <= 0) {
                        if (ret > 0) return ret;
                        return r1;
                }
                len -= r1;
                buf += r1;
                ret += r1;
        }
        return ret;
-------------------------------------8<-------------------------------------

Test of unpatched rsync (bogus error message):

write failed on jvejunk2 : Success
rsync error: error in file IO (code 11) at receiver.c(243)
rsync: connection unexpectedly closed (200 bytes read so far)
rsync error: error in rsync protocol data stream (code 12) at io.c(165)


Test of patched rsync (correct error message):

write failed on jvejunk2 : No space left on device
rsync error: error in file IO (code 11) at receiver.c(243)
rsync: connection unexpectedly closed (200 bytes read so far)
rsync error: error in rsync protocol data stream (code 12) at io.c(165)

-- 
        John Van Essen <vanes002 at umn.edu>




On Thu, 7 Nov 2002 12:37:09, Green, Paul <Paul.Green at stratus.com> wrote:
>John Van Essen [mailto:vanes002 at umn.edu] wrote, in partial:
>> In receiver.c there are three places with code similar to:
>>
>>    if (fd != -1 && write_file(fd,data,i) != i) {
>>        rprintf(FERROR,"write failed on %s : %s\n",fname,strerror(errno));
>>        exit_cleanup(RERR_FILEIO);
>>    }
>>
>> A partial write due to running out of room (as opposed to a write
>> that fails to write any data at all) apparently does not set an
>> error code (errno stays 0).  This is on RedHat 7.2 btw.
>[snip]
>> So the rprintf lines need to be changed to something like this:
>> 
>>         rprintf(FERROR,"write failed on %s : %s\n",fname,
>>           strerror(errno ? errno : ENOSPC));
>
>I disagree.  If you read the POSIX standard, you will see that the error
>code is returned on the NEXT call to write the data. POSIX requires that the
>"short write" not return any error.  (It can't, because it has to set the
>return value to -1 to indicate the error, but it also has to set the return
>value to the number of bytes written in the short write; hence, two calls
>are needed).  So, there are more problems with this code than just the way
>it handles errors.
>
>See paragraph 6.4.2.1 of the POSIX-1996 standard.  To quote the relevant
>paragraph here:
>
>"If a write() requests that more bytes be written than there is room for
>(for example, the physical end of a medium), only as many bytes as there is
>room for shall be written. For example, suppose there is space for 20 bytes
>more in a file before reaching a limit. A write of 512 bytes would return
>20. The next write of a nonzero number of bytes would give a failure return
>(except as noted below)."
>
>The exception has to do with getting interrupted by a signal and is not
>interesting here.
>
>"Upon successful completion, write() shall return an integer indicating the
>number of bytes actually written. Otherwise, it shall return a value of -1
>and set errno to indicate the error."
>
>"[ENOSPC] There is no free space remaining on the device containing the
>file."
>
>So, RedHat is correct in not setting errno in this case.  And you are
>correct that ENOSPC is the proper error code for this case, but you only
>know it is out of space via delphic knowledge.  rsync can't possibly know
>that.  I would not want to see rsync assume it knows the proper error code;
>that's the job of the OS.  Instead, rsync should be recoded to continue
>attempting to write data until get it gets a nonzero error code, not until
>it gets a short write.
>
>Patches welcome.
>
>PG
>--
>Paul Green, Senior Technical Consultant, Stratus Technologies.
>Voice: +1 (978) 461-7557; FAX: +1 (978) 461-3610
>Speaking from Stratus not for Stratus



More information about the rsync mailing list