improper check on written in non mmap path of tdb_write, tdb 1.2.9

Thu Jan 27 20:19:29 MST 2011

Hi all, I was just casually reading the sources for tdb 1.2.9 and in tdb_write i think there may be a bug in the non mmap path.

The non mmap path begins at line 94:

 94:   } else {
 95:      ssize_t written = pwrite(tdb->fd, buf, len, off);
 96:      if ((written != (ssize_t)len) && (written != -1)) {
 97:         /* try once more */
 98:         tdb->ecode = TDB_ERR_IO;
 99:         TDB_LOG((tdb, TDB_DEBUG_FATAL, "tdb_write: wrote only "
100:             "%d of %d bytes at %d, trying once more\n",
101:             (int)written, len, off));
102:         written = pwrite(tdb->fd, (const char *)buf+written,
103:                len-written,
104:                off+written);
105:      }
106:      if (written == -1) {
107:         /* Ensure ecode is set for log fn. */
108:         tdb->ecode = TDB_ERR_IO;
109:         TDB_LOG((tdb, TDB_DEBUG_FATAL,"tdb_write failed at %d "
110:             "len=%d (%s)\n", off, len, strerror(errno)));
111:         return -1;
112:      } else if (written != (ssize_t)len) {
113:         tdb->ecode = TDB_ERR_IO;
114:         TDB_LOG((tdb, TDB_DEBUG_FATAL, "tdb_write: failed to "
115:             "write %d bytes at %d in two attempts\n",
116:             len, off));
117:         return -1;
118:      }
119:   }

At line 95 the first attempt to call pwrite returns the bytes stored into "written". Lets call that value written1.

Line 96 tests to see if it was an incomplete write, and if so, at 102 a second attempt is made. This second attempt starts written1 bytes later (with 0 <= written1 <= len - 1), and returns the bytes stored also into written. Lets call that written2.

Later on line 112 we check to see that written is the same as len, which would be fine if the first call to pwrite completes fully or was interrupted with zero bytes stored, but, and here's the bug, if the first call to pwrite writes partially and the second finishes fully, written has the value of written2, not the value of written1 + written2.

If this occurs, the test at line 112 will report failure even though the whole the operation completed successfully.

Isn't it a little premature for line 98 to set the ecode to TDB_ERR_IO before we're sure whe whole operation has failed? 

Also, should TDB_LOG on line 99 really have TDB_DEBUG_FATAL before we retry the operation? Maybe TDB_DEBUG_WARNING would be more appropriate.

tdb_read doesn't have this problem because it doesn't do a try again if there is an incomplete read, although maybe there should be a retry to be consistent with tdb_write.