[Samba] Setting Samba Write Cache Size Can Cause File Corruption
Volker Lendecke
Volker.Lendecke at SerNet.DE
Tue Sep 28 15:41:50 MDT 2010
On Tue, Sep 28, 2010 at 05:34:37PM -0400, Andy Liebman wrote:
> Back in June we had a thread going on this list about a problem we were
> seeing in which Disk I/Os on a Linux server periodically dropped out for
> a fraction of a second under very high Samba load (high load = 100s of
> MB/sec for both Read and Write).
>
> If you are interested in the details of the old thread, search the Samba
> list for "Possible Issue with Samba Blocking I/O and CPU"
>
> Anyway, we came to the conclusion that using the Samba variable "write
> cache size = 262144" could significantly reduce the incidence of these
> I/O drop outs. If we understand correctly, this setting influences the
> minimum amount of data that Samba will send to the filesystem in a given
> write event. We suspect setting this value (versus not setting it) can
> provide a mechanism to help keep Samba writes aligned to the stripe size
> and stripe width of a hardware RAID array and help reduce or eliminate
> so-called "partial stripe writes".
>
> After months of successful testing and real-world use, we believe we
> found a situation in which setting the "write cache size" causes a
> serious glitch. Searching Google with the terms "Samba 'write cache
> size' file corruption" yields a few prior cases of reported corruption,
> one with a patch to Samba in October of 2002. A couple of subsequent
> reports seem to have remained unresolved.
>
> In our situation, when "write cache size" is set to 262144 and when a
> certain non-linear video editing application imports still images and
> saves them as a single video frame to a Samba share, under some
> circumstances the file can get corrupted. At least that's what the
> editing application says about the file. Most of the time, importing
> works out just fine. For all codecs tested -- ranging from 25 Mbit/sec
> "DV25" to 100 Mbit/sec "DVC Pro HD" to uncompressed SD and HD
> (respectively, about 28 and 160 MB/sec) -- imported still frame images
> are fine when the video standard is NTSC. But for PAL video (where each
> frame is slightly larger in size, but the total MB/sec is slightly lower
> due to 25 versus 29.97 frames per second) we found a couple of medium
> data rate codecs where the imported still frames always get corrupted.
> It is 100 percent reproducible.
>
> The problem doesn't seem to be the actual SIZE of the files. In other
> words, it's not like you pass some size threshold and then see the
> problem, or even that there are particular file sizes that cause
> problems. You can import a still image as a DV25 PAL frame and get a
> 641KB file and you can import the same still image as a DV50 NTSC frame
> and also get a 641KB file and the PAL file is always corrupted but the
> NTSC file is always fine. (I know it is weird that the files are the
> same size -- NTSC DV50 has double the data rate per sec and 20 percent
> more frames per second than PAL DV25, so a single sample frame from DV50
> NTSC should be approximately 2*(PAL DV25)*(25/30) or 1.66*(PAL DV25).
> But they are the same. What can I say?).
>
> I can tell you that setting write cache size to 131072 (half the size)
> makes the corruption go away, and so does turning off the "write cache
> size" setting altogether. However, we are now wondering why the "write
> cache size" can have this effect on file corruption and whether setting
> it to 131072 will cause a corruption problem under some other
> circumstance we just haven't hit yet.
>
> Any ideas? By the way, we have seen and documented this problem with
> both Samba 3.4.2 and Samba 3.5.3. We also noticed that "write cache
> size" was listed as "deprecated" in 3.4.2 and that in 3.5.3 it is no
> longer listed as "deprecated". Somebody besides us must have thought
> keeping "write cache size" was still a good idea??
Well, it *is* a crap idea that happens to be wildly
successful in tuning weird workloads.
A couple of weeks (months?) ago I've done some considerable
tuning to the write cache, which changed a few code paths. I
would really like to see if master (or v3-6-test) still
corrupts files. If it does, I am very interested in fixing
that. What I would need is a debug level 10 log of smbd
doing that together with a network trace and an strace.
Probably a HUGE amount of data, but that is necessary
unfortunately.
Volker
More information about the samba
mailing list