[Samba] Setting Samba Write Cache Size Can Cause File Corruption
andyliebman at aol.com
Tue Sep 28 15:34:37 MDT 2010
Back in June we had a thread going on this list about a problem we were
seeing in which Disk I/Os on a Linux server periodically dropped out for
a fraction of a second under very high Samba load (high load = 100s of
MB/sec for both Read and Write).
If you are interested in the details of the old thread, search the Samba
list for "Possible Issue with Samba Blocking I/O and CPU"
Anyway, we came to the conclusion that using the Samba variable "write
cache size = 262144" could significantly reduce the incidence of these
I/O drop outs. If we understand correctly, this setting influences the
minimum amount of data that Samba will send to the filesystem in a given
write event. We suspect setting this value (versus not setting it) can
provide a mechanism to help keep Samba writes aligned to the stripe size
and stripe width of a hardware RAID array and help reduce or eliminate
so-called "partial stripe writes".
After months of successful testing and real-world use, we believe we
found a situation in which setting the "write cache size" causes a
serious glitch. Searching Google with the terms "Samba 'write cache
size' file corruption" yields a few prior cases of reported corruption,
one with a patch to Samba in October of 2002. A couple of subsequent
reports seem to have remained unresolved.
In our situation, when "write cache size" is set to 262144 and when a
certain non-linear video editing application imports still images and
saves them as a single video frame to a Samba share, under some
circumstances the file can get corrupted. At least that's what the
editing application says about the file. Most of the time, importing
works out just fine. For all codecs tested -- ranging from 25 Mbit/sec
"DV25" to 100 Mbit/sec "DVC Pro HD" to uncompressed SD and HD
(respectively, about 28 and 160 MB/sec) -- imported still frame images
are fine when the video standard is NTSC. But for PAL video (where each
frame is slightly larger in size, but the total MB/sec is slightly lower
due to 25 versus 29.97 frames per second) we found a couple of medium
data rate codecs where the imported still frames always get corrupted.
It is 100 percent reproducible.
The problem doesn't seem to be the actual SIZE of the files. In other
words, it's not like you pass some size threshold and then see the
problem, or even that there are particular file sizes that cause
problems. You can import a still image as a DV25 PAL frame and get a
641KB file and you can import the same still image as a DV50 NTSC frame
and also get a 641KB file and the PAL file is always corrupted but the
NTSC file is always fine. (I know it is weird that the files are the
same size -- NTSC DV50 has double the data rate per sec and 20 percent
more frames per second than PAL DV25, so a single sample frame from DV50
NTSC should be approximately 2*(PAL DV25)*(25/30) or 1.66*(PAL DV25).
But they are the same. What can I say?).
I can tell you that setting write cache size to 131072 (half the size)
makes the corruption go away, and so does turning off the "write cache
size" setting altogether. However, we are now wondering why the "write
cache size" can have this effect on file corruption and whether setting
it to 131072 will cause a corruption problem under some other
circumstance we just haven't hit yet.
Any ideas? By the way, we have seen and documented this problem with
both Samba 3.4.2 and Samba 3.5.3. We also noticed that "write cache
size" was listed as "deprecated" in 3.4.2 and that in 3.5.3 it is no
longer listed as "deprecated". Somebody besides us must have thought
keeping "write cache size" was still a good idea??
More information about the samba