[Samba] Samba PDC problem: Please help me avoid a mutiny! :-)
Ray Simard
ray.simard at sylvan-glade.com
Sun Nov 10 04:53:00 GMT 2002
I've been beating my head against this one and just can't figure it out. I
hope someone here may have an answer. The employees using the workstations on
this network are getting increasingly upset with this problem.
The problem is wildly varying logon and logoff times over the network. This is
definitely not a matter of long profile transfers. An individual can log onto
a workstation one time and get on quickly, and another time, have to wait
five minutes or more. There is no apparent pattern that I can discern. No
workstations seem to manifest this problem more than others; no users seem to
have more difficulty with this than others; it seems to make no difference if
the user has logged onto a particular station before, or even if he/she's
logged onto another station at the same time.
The network consists of one Samba PDC, 2.2.6, recently upgraded from 2.2.3a,
and about 12 NT 4.0 workstations on two subnets. The problem occurs with
workstations on the PDC's local subnet and the other one. Cross-subnet
browsing is working fine.
In the effort to troubleshoot this, I set up the log file parameter to create
a separate log for each workstation and user (log file =
/var/log/samba/log.smbd.%m-%U). It helps untangle the mess. and I can merge
the log files when I need to. When running tests I jacked up the log level to
10, and when I upgraded to 2.2.6, I compiled a test version with some extra
debugging code of my own to help figure it out. Still, I'm baffled.
The manifestation is, in nearly all cases, that the PDC sends a message to the
workstation and waits for a response. The response eventually arrives, and as
far as I can tell, makes sense, but the time that elapses before the reply
from the workstation can sometimes amount to minutes. The workstation event
logs have entries pertaining to these gaps (verified by comparing timestamps)
from the Redirector services usually saying "The redirector has timed out a
request to SERVICES" (SERVICES is the NetBIOS name of the PDC).
Sometimes, however, there is an entry saying, "A write-behind operation has
failed to the remote server services. The data contains the amount requested
to write and the amount actually written.": The data dump reads,
00 00 08 00 02 00 52 00
These numbers are consistent in case after case.
It doesn't seem to make any sense if these are 16-bit values, which would mean
zero requested and 8 written. If they are 32-bit, 524288 (0x80000) was
requested and 5373954 (0x520002) was written. None of this makes any sense to
me.
The socket options are SO_KEEPALIVE TCP_NODELAY IPTOS_LOWDELAY
I can't imagine a reason why the workstation would try to send something and
the server wouldn't accept it. In a few early tests, I added tcpdump output
to the logs (using hires timestamps to correlate them) and it appears that
the workstations are not even trying to send anything during that gap.
I'm lost at this point. I really hope someone can help. This problem has been
around for quite some time and the workers are getting tired of it, and my
promises to fix it.
Many thanks in advance,
Ray Simard
ray.simard at sylvan-glade.com
More information about the samba
mailing list