Bogus rsync "Success" message when out of disk space
jw schultz
jw at pegasys.ws
Sun Apr 27 10:42:14 EST 2003
On Sat, Apr 26, 2003 at 03:09:29PM -0500, John Van Essen wrote:
> Patches welcome, eh, Paul?
>
> Upon further (belated) investigation, there are 2 affected places
> in receiver.c with this error message. Both call write_file().
> And write_file is called only in those two places. So that is the
> appropriate location to patch. Especially since the obvious fix is
> to use the rewrite code already there for the sparse file writes.
This looks correct to me. On partial write we need to
retry. The callers to write_file only test for
completeness.
I'd drop the added comments or put a generic
"loop until all is written or error" at the top of the loop.
The inline comments make the code harder to read.
> -------------------------------------8<-------------------------------------
> --- fileio.c.orig Fri Jan 25 15:07:34 2002
> +++ fileio.c Sat Apr 26 12:16:25 2003
> @@ -69,25 +69,28 @@
> return len;
> }
>
>
>
> int write_file(int f,char *buf,size_t len)
> {
> int ret = 0;
>
> - if (!sparse_files) {
> - return write(f,buf,len);
> - }
> -
> while (len>0) {
> - int len1 = MIN(len, SPARSE_WRITE_SIZE);
> - int r1 = write_sparse(f, buf, len1);
> + int r1;
> + if (sparse_files) {
> + int len1 = MIN(len, SPARSE_WRITE_SIZE);
> + r1 = write_sparse(f, buf, len1);
> + } else {
> + /* Normal writes are in this loop, too, so that */
> + /* retries of partial writes will set errno. */
> + r1 = write(f, buf, len);
> + }
> if (r1 <= 0) {
> if (ret > 0) return ret;
> return r1;
> }
> len -= r1;
> buf += r1;
> ret += r1;
> }
> return ret;
> -------------------------------------8<-------------------------------------
>
> Test of unpatched rsync (bogus error message):
>
> write failed on jvejunk2 : Success
> rsync error: error in file IO (code 11) at receiver.c(243)
> rsync: connection unexpectedly closed (200 bytes read so far)
> rsync error: error in rsync protocol data stream (code 12) at io.c(165)
>
>
> Test of patched rsync (correct error message):
>
> write failed on jvejunk2 : No space left on device
> rsync error: error in file IO (code 11) at receiver.c(243)
> rsync: connection unexpectedly closed (200 bytes read so far)
> rsync error: error in rsync protocol data stream (code 12) at io.c(165)
>
> --
> John Van Essen <vanes002 at umn.edu>
>
>
>
>
> On Thu, 7 Nov 2002 12:37:09, Green, Paul <Paul.Green at stratus.com> wrote:
> >John Van Essen [mailto:vanes002 at umn.edu] wrote, in partial:
> >> In receiver.c there are three places with code similar to:
> >>
> >> if (fd != -1 && write_file(fd,data,i) != i) {
> >> rprintf(FERROR,"write failed on %s : %s\n",fname,strerror(errno));
> >> exit_cleanup(RERR_FILEIO);
> >> }
> >>
> >> A partial write due to running out of room (as opposed to a write
> >> that fails to write any data at all) apparently does not set an
> >> error code (errno stays 0). This is on RedHat 7.2 btw.
> >[snip]
> >> So the rprintf lines need to be changed to something like this:
> >>
> >> rprintf(FERROR,"write failed on %s : %s\n",fname,
> >> strerror(errno ? errno : ENOSPC));
> >
> >I disagree. If you read the POSIX standard, you will see that the error
> >code is returned on the NEXT call to write the data. POSIX requires that the
> >"short write" not return any error. (It can't, because it has to set the
> >return value to -1 to indicate the error, but it also has to set the return
> >value to the number of bytes written in the short write; hence, two calls
> >are needed). So, there are more problems with this code than just the way
> >it handles errors.
> >
> >See paragraph 6.4.2.1 of the POSIX-1996 standard. To quote the relevant
> >paragraph here:
> >
> >"If a write() requests that more bytes be written than there is room for
> >(for example, the physical end of a medium), only as many bytes as there is
> >room for shall be written. For example, suppose there is space for 20 bytes
> >more in a file before reaching a limit. A write of 512 bytes would return
> >20. The next write of a nonzero number of bytes would give a failure return
> >(except as noted below)."
> >
> >The exception has to do with getting interrupted by a signal and is not
> >interesting here.
> >
> >"Upon successful completion, write() shall return an integer indicating the
> >number of bytes actually written. Otherwise, it shall return a value of -1
> >and set errno to indicate the error."
> >
> >"[ENOSPC] There is no free space remaining on the device containing the
> >file."
> >
> >So, RedHat is correct in not setting errno in this case. And you are
> >correct that ENOSPC is the proper error code for this case, but you only
> >know it is out of space via delphic knowledge. rsync can't possibly know
> >that. I would not want to see rsync assume it knows the proper error code;
> >that's the job of the OS. Instead, rsync should be recoded to continue
> >attempting to write data until get it gets a nonzero error code, not until
> >it gets a short write.
> >
> >Patches welcome.
> >
> >PG
> >--
> >Paul Green, Senior Technical Consultant, Stratus Technologies.
> >Voice: +1 (978) 461-7557; FAX: +1 (978) 461-3610
> >Speaking from Stratus not for Stratus
>
> --
> To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
>
--
________________________________________________________________
J.W. Schultz Pegasystems Technologies
email address: jw at pegasys.ws
Remember Cernan and Schmitt
More information about the rsync
mailing list