[Samba] Setting Samba Write Cache Size Can Cause File Corruption

Andy Liebman andyliebman at aol.com
Tue Sep 28 15:34:37 MDT 2010


Back in June we had a thread going on this list about a problem we were 
seeing in which Disk I/Os on a Linux server periodically dropped out for 
a fraction of a second under very high Samba load  (high load = 100s of 
MB/sec for both Read and Write).

If you are interested in the details of the old thread, search the Samba 
list for "Possible Issue with Samba Blocking I/O and CPU"

Anyway, we came to the conclusion that using the Samba variable "write 
cache size = 262144" could significantly reduce the incidence of these 
I/O drop outs.  If we understand correctly,  this setting influences the 
minimum amount of data that Samba will send to the filesystem in a given 
write event.  We suspect setting this value (versus not setting it) can 
provide a mechanism to help keep Samba writes aligned to the stripe size 
and stripe width of a hardware RAID array and help reduce or eliminate 
so-called "partial stripe writes".

After months of successful testing and real-world use,  we believe we 
found a situation in which setting the "write cache size" causes a 
serious glitch.  Searching Google with the terms  "Samba 'write cache 
size' file corruption"  yields a few prior cases of reported corruption, 
one with a patch to Samba in October of 2002.  A couple of subsequent 
reports seem to have remained unresolved.

In our situation, when "write cache size" is set to 262144 and when a 
certain non-linear video editing application imports still images and 
saves them as a single video frame to a Samba share, under some 
circumstances the file can get corrupted. At least that's what the 
editing application says about the file. Most of the time, importing 
works out just fine.  For all codecs tested -- ranging from 25 Mbit/sec 
"DV25" to 100 Mbit/sec "DVC Pro HD" to uncompressed SD and HD  
(respectively, about 28 and 160 MB/sec) -- imported still frame images 
are fine when the video standard is NTSC.  But for PAL video (where each 
frame is slightly larger in size, but the total MB/sec is slightly lower 
due to 25 versus 29.97 frames per second) we found a couple of medium 
data rate codecs where the imported still frames always get corrupted.  
It is 100 percent reproducible.

The problem doesn't seem to be the actual SIZE of the files.  In other 
words, it's not like you pass some size threshold and then see the 
problem, or even that there are particular file sizes that cause 
problems.  You can import a still image as a DV25 PAL frame and get a 
641KB file and you can import the same still image as a DV50 NTSC frame 
and also get a  641KB file and the PAL file is always corrupted but the 
NTSC file is always fine. (I know it is weird that the files are the 
same size -- NTSC DV50 has double the data rate per sec and 20 percent 
more frames per second than PAL DV25, so a single sample frame from DV50 
NTSC should be approximately 2*(PAL DV25)*(25/30) or 1.66*(PAL DV25).  
But they are the same. What can I say?).

I can tell you that setting write cache size to 131072 (half the size) 
makes the corruption go away, and so does turning off the "write cache 
size" setting altogether.  However, we are now wondering why the "write 
cache size" can have this effect on file corruption and whether setting 
it to 131072 will cause a corruption problem under some other 
circumstance we just haven't hit yet.

Any ideas?  By the way, we have seen and documented this problem with 
both Samba 3.4.2 and Samba 3.5.3. We also noticed that "write cache 
size" was listed as "deprecated" in 3.4.2 and that in 3.5.3 it is no 
longer listed as "deprecated".  Somebody besides us must have thought 
keeping "write cache size" was still a good idea??

Andy Liebman

More information about the samba mailing list