Fixed: OpLocks caused the corruptions/slowness (Was: How Samb a let us down)

John H Terpstra jht at samba.org
Tue Oct 29 16:15:49 GMT 2002


On Tue, 29 Oct 2002, David Brodbeck wrote:

>
>
> > -----Original Message-----
> > From: Green, Paul [mailto:Paul.Green at stratus.com]
>
> > My opinion is that the "right fix" is for anyone who is
> > experiencing data corruption of any sort, whether with oplocks on, off, or
>
> > sideways, to work with the Samba team to come up with a reproducible test
> case
> > so that we can root cause the true source of the problem.  Then, we can
> > design and test some sort of fix, and no one else will ever have to worry
> about it.
>
> What I'm seeing from the Samba team is "this is a Windows client bug" or
> "this is an MS Access bug".  I'm not saying they're wrong, but if that's the
> conclusion that's been reached wouldn't the rest of us just be wasting our
> time by trying to test this?  The consensus seems to be that oplocks with
> Windows clients are simply broken by design.

Correct, but we still need to emulate the way it works correctly. So if we
have a bug, we need to find and fix it. We need help from our users to
create the test case that reproduces the problem. In the absence of this
all we can really do is offer empathy with the pain.

>
> FWIW, I've never seen any corruption I could blame on Samba, with oplocks
> on, but my site only has 30 users, tops, and the most we ever had using the
> Access database simultaneously was five or six.  (I did turn kernel oplocks
> off a couple months ago, but only because we don't need them -- nothing gets
> accessed from the UNIX side except during backups.)  We actually saw more
> corruption in the Access database under Windows NT, but I blame this on a
> user who had a bad network connection that we discovered about the time we
> switched to Samba.

This is a not uncommon finding. I have followed up with many users who
have complained of Linux and / or Samba problems to find that they were
having problems with MS Windows NT so they decided to try Samba. So when
this fails they turn to this list (or even mail team members directly)
complaining that Samba is broken. We all know that all software is likely
to be broken in some way - bugs are inevitable and the risk increases
exponentially with the size of the code base. (Don't flame me for this
statement please ;))

Here are the more common causes of corruption problems:

	1. Defective HUBs/Switches (especially the cheaper varieties)
	2. Defective Network cards
	3. Defective Routers (in particular incorrect use of NetBIOS
		UDP forwarding)
	4. Defective Hard Disk on server
	5. ESD (Electro-Static Damage) to motherboard
		- many older style motherboards suffered ESD damage to
		  the interrupt controller chip.
	6. Bad TCP/IP configuration _or_ inconsistent installation of
	   multiple network protocols (on MS Windows clients)
		- ie: Inconsistent LANA ordering on MS Windows (9X,NT,...)

I am sure that with a little effort we can expand this list, just like I
am certain that when someone is in trouble they like to find help, though
some do it by "blaming the gasolene when the tires wear out".

I do agree that we could better document the ins and outs of data
corruption and how to correctly diagnose a problem situation. Then again,
when in the heat of a serious problem, it is a bit trying to rememeber to
RTFM isn't it?


> This would tend to back up the theory that dropped
> packets aggravate this problem.  It's rather shocking to me that SMB reacts
> to poorly to network problems, but I realize there's not much Samba can do
> about the crummy protocol design. ;)


- John T.
-- 
John H Terpstra
Email: jht at samba.org




More information about the samba-technical mailing list