improper check on written in non mmap path of tdb_write, tdb 1.2.9
Rusty Russell
rusty at rustcorp.com.au
Mon Feb 7 20:36:01 MST 2011
On Tue, 8 Feb 2011 12:27:24 am Michael Adam wrote:
> Hi,
>
> sorry for the late reply.
>
> Rusty, you are right in that I did those changes.
> Actually they were done in oder to treat only the correct
> error conditions on pwrite as errors. But it seems
> something is not quite correct here. Thanks, James,
> for spotting this.
>
> There is another place where I did the first patch like this:
> it is tdb_expand_file() in common/io.c
> But there, I think I did it right or at least better
> (commits d568e2b and af8432e in master):
>
> 263 /* now fill the file with something. This ensures that the
> 264 file isn't sparse, which would be very bad if we ran out of
> 265 disk. This must be done with write, not via mmap */
> 266 memset(buf, TDB_PAD_BYTE, sizeof(buf));
> 267 while (addition) {
> 268 size_t n = addition>sizeof(buf)?sizeof(buf):addition;
> 269 ssize_t written = pwrite(tdb->fd, buf, n, size);
> 270 if (written == 0) {
> 271 /* prevent infinite loops: try _once_ more */
> 272 written = pwrite(tdb->fd, buf, n, size);
> 273 }
> 274 if (written == 0) {
> 275 /* give up, trying to provide a useful errno */
> 276 TDB_LOG((tdb, TDB_DEBUG_FATAL, "expand_file write "
> 277 "returned 0 twice: giving up!\n"));
> 278 errno = ENOSPC;
> 279 return -1;
> 280 } else if (written == -1) {
> 281 TDB_LOG((tdb, TDB_DEBUG_FATAL, "expand_file write of "
> 282 "%d bytes failed (%s)\n", (int)n,
> 283 strerror(errno)));
> 284 return -1;
> 285 } else if (written != n) {
> 286 TDB_LOG((tdb, TDB_DEBUG_WARNING, "expand_file: wrote "
> 287 "only %d of %d bytes - retrying\n", (int)written,
> 288 (int)n));
> 289 }
> 290 addition -= written;
> 291 size += written;
> 292 }
>
> At least it is better in two respects:
> - it tries more than once
> - it correctly corrects the summed up size of what has been written
> maybe the made-up errno of ENOSPC is not good.
>
> Doing these patches was triggered by a real error situation
> over which I stumbled, in which using
> (return code of pwrite) != (bytes to be written)
> really was wrong. I can't quite remember what it was right now, but
> I will try to reproduce it.
This is still voodoo programming. And it's due to bad commit messages,
so we don't know what was actually being fixed :(
I suspect you were actually hitting quota or equiv, but I'm nervous in case
we're wrong. I'm tempted to apply your patches, but *not* do it for TDB2,
where I will just report a failure.
Thanks,
Rusty.
More information about the samba-technical
mailing list