Potential memory corruption in fileio.c:write_file() since 3.0.20 with write cache size

Fri Aug 3 15:06:46 GMT 2007

server: Samba 3.0.23c on CentOS5 x86_64
client: Autodesk Maya 8.5 (a 3D modeling and animation application)  
on Windows XP SP2 32 bit

After upgrading from Fedora Core 4/Samba 3.0.23a, I started  
experiencing mysterious hangs with Samba, which seemed to be  
triggered when writing large scene files from Maya 8.5 in Maya Binary  
format to a Samba share.

After some digging, I came to the following conclusion:

The share in question had:

write cache size = 32768

set in smb.conf. Turns out that when writing out largish files  
(around 50MB), in some cases Maya likes to write a single byte 49152  
bytes past the current end of file. This would trigger the following  
optimization in write_file() in smbd/fileio.c around line 510:

} else if ( (pos >= wcp->file_size) &&
   (n == 1) &&
   (pos < wcp->offset + 2*wcp->alloc_size) &&
   (wcp->file_size == wcp->offset + wcp->data_size)) {

   SMB_BIG_UINT new_start = wcp->offset + wcp->data_size;

   flush_write_cache(fsp, WRITE_FLUSH);
   wcp->offset = new_start;
   wcp->data_size = pos - new_start + 1;
   memset(wcp->data, '\0', wcp->data_size);
   memcpy(wcp->data + wcp->data_size-1, data, 1);

an optimization which appeared in 3.0.20, and claims to help with  
Microsoft Office behavior. With "write cache size" set to 32768, this  
means that wcp->alloc_size is 32768. If the application tries to  
write 1 byte 49152 bytes past the end of the file, the condition will  
trigger, and the call to memset() will write past the end of the  
32768 byte long piece of memory allocated for the write cache  
bufferby setup_write_cache(), causing heap corruption. valgrind  
confirms that this is happening. In theory, I should have been seeing  
this with 3.0.23a on FC4 as well, but I guess I just got lucky.

In my case, this would cause a call to free() to fail somewhere down  
the line: the SIGABORT handler would try to pretty print a stack  
trace or something, but that would in turn try to call malloc()/free 
(), and hang on a futex() call. So the errant smbd would never  
actually die, hold on to a lock on /var/cache/samba/locking.tbd, and  
eventually all other smbds would grind to a halt when they would try  
to lock locking.tdb

The fix appears to be simple: just make sure that the condition only  
fires if you are trying to write no further than wcp->data_size from  
the current end of file:

} else if ( (pos >= wcp->file_size) &&
   (n == 1) &&
   (pos < wcp->offset + wcp->alloc_size) &&
   (wcp->file_size == wcp->offset + wcp->data_size)) {

Minimal testing shows this to work for my case, but I haven't looked  
at the code in detail to make sure it always works. An alternative  
could be to comment out the entire if() clause and do away with the  
optimization.

This is my first attempt at looking at Samba internals, so I could  
have missed something obvious...

Jean-Francois Panisset
A52/Rock Paper Scissors