Fixed: OpLocks caused the corruptions/slowness (Was: How Samba let us down)
jay at jayts.cx
Wed Oct 23 23:07:02 GMT 2002
Chris de Vidal wrote:
> My first post, for reference:
> When the new NT server's hard drive died, we decided
> to keep hobbling along on Samba. Meanwhile, my
> supervisor was searching around on OpLock issues on
> Google and he saw other people that were having
> similar problems. We disabled all OpLocks (kernel,
> level I and II, kernel at the global level, level X at
> the share) early this morning, and since then things
> have been fine! Yesterday and the day before, the
> problem appeared quickly, so (knock on wood), I think
> we did fix it. Time will tell.
Cool. Now it's probably *way* inconvenient, but it
would be great to test thoroughly, then re-enable those
oplock settings one at at time to see if the problem comes
> * We are using RedHat 7.3 (no ACLs included) but
> created a custom kernel (2.4.19) with ext3 ACL support
> and installed all of the userland ACL tools.
Again, sorry about that: I had gotten confused (ACLs
with kernel oplocks) because there was another thread
I replied to in which ACLs were being discussed. And
I wasn't awake yet. :(
> * The corruption was missing records. It would
> interrupt the print process and the Opus analysis
> indicated hundreds of records were missing. It would
> happen in random places in print files (hundreds of
> megs to gigs in size), and seldomly would not happen
> at all.
I still don't understand! Ok, the files are not printed
on the Samba host, they are printed through an NT
print server, correct? So are you saying that it's
files served by Samba that are being sent to the printer,
and that's where you're losing data?
[ok I just re-read your original post...] You said that
the Samba server is used as a "print spooling area".
Can you elucidate? It seems you are offering a Samba
file share, which is used by another system(s?) for
NT's printer spool files.
> If it is _officially_
> recognized by the developers as a caveat, it ought to
> be put into the docs/manpages. I apologize if it IS
> there but I missed it.
There are some "dangerous" smb.conf parameters, and
AFAIK (maybe not infinitely far ;) the Samba Team
have documented that they can be misused in a way
that can result in corruption.
Did you check the manual page for smb.conf(5), especially
for the parameters having to do with locking, to check
that you weren't doing anything wrong?
> Anyway, it appears to have been fixed. I don't yet
> know what kind of performance hit we will see, but so
> far, so good.
It might not be so bad. Actually, for large database
files, it may speed things up quite a bit (and avoid
problems) to have the oplocks turned off. This is a
"known thing". [again, after re-reading your original
message...] Aha, you say that the Samba server is
serving flat database files. If those database files
are large, this by itself says "turn oplocks off".
And this may apply to the files in the share you're
using as a print spool area, too.
Here is a "sneak preview" excerpt from the second edition
of Using Samba, regarding use of oplocks:
|Generally, we recommend using the defaults provided by Samba:
|standard DOS/Windows deny-mode locks for compatibility and
|oplocks for the extra performance that local caching allows.
|If your operating system can take advantage of oplocks, it
|should provide significant performance improvements.
|One very notable exception is large data files, such as those
|used by database software. If a client is allowed to oplock this
|kind of file, there is a huge delay while the client copies the
|entire file from the server in order to cache it, even though it
|may only need to update one record. The situation goes from bad
|to worse when another client tries to open the oplocked file. The
|first client must write the entire file back to the server before
|the second client's file open request can succeed. This results in
|another huge delay (for both clients) which in practice often
|results in a failed open due to a timeout on the second client,
|perhaps along with a message warning of possible database corruption!
|You can set veto oplock files, as in the previous example, to avoid
|this kind of problem.
Just to head off another bunch of comments from the Samba Team,
please understand that just because you get a message from Windows
that says your database is *possibly* corrupt, it doesn't mean
that your database *is* corrupt. OK? ;-)
Aside from that, I welcome any comments on the above excerpt.
It was suggested by David Collier-Brown, my co-author, and
he had to explain it carefully to me before I wrote about it.
Any suggestions for improving the discussion are welcome!
The comment about the 'veto oplock files' parameter applies
when you have just a few files in the share that may be
problematic, like a single, huge file in a share, or some
number of files with somehow-similar names that can be matched
using a file globbing pattern or patterns.
> We might reenable kernel then
> regular then level2 oplocks later to see if it was
> just one particular type.
Pretty please! I'm really curious to find out exactly
what was happening.
More information about the samba-technical