[Samba] PDC and BDC load-balancing
norman at lateral.co.za
Sat Jul 27 10:27:02 GMT 2002
I would like to know how Samba / Windows determines which domain
controller should handle a logon request, and whether there is a way I
can affect the process.
Here's the situation: I have a school installation running a Samba
domain, with a PDC (1.1GHz Celeron, 256 MB RAM) and one BDC (much
smaller, 366 with 64 MB RAM), both with RedHat 7.1 and Samba 2.2.5.
There are about 80 NT and 2000 workstations, and about 10 98 machines.
At first I just had a PDC. Students use different computers from one
lesson to the next, and mess with settings a lot, so the choice to use
mandatory roaming profiles was an obvious one. Also, some workstations
have fairly small hard drives, so I disabled Windows from caching
profiles locally. From experience we know that a hard drive can fill up
with cached profiles, and Windows falls over.
When we tested this everything worked beautifully ... until a class all
logged in simultaneously. Suddenly there were 35 simultaneous requests
for a 810KB profile, and somewhere there was a bottleneck. I thought the
PDC would be able to cope with that, and I thought the network was fast
enough to deal with that (100Mbps ethernet), but I was wrong. The
workstations eventually timed out, and went back to the login screen,
but Samba kept on trying to open new connections, several for each
workstation, and the smbd process for each workstation got more and more
demanding. They didn't respond to SIGTERM and I had to kill them.
So I figured Plan A was to install a BDC. I didn't have another 1.1GHz
machine, so I decided to test with a smaller one, see how it affected
the PDC's load, and take it from there.
So here's my problem: It didn't affect the load much at all! Log 35
users in at the same time, and Samba still grinds to a halt on the PDC,
but the BDC only services about 3 or 4 connection requests. In fact, try
it with 5 users, and the PDC still gets caught in a vicious smbd process
cycle, and the BDC might service 1 request. Is it because the BDC is a
significantly slower machine? Even though it's slower, it's
underutilised, and the PDC is swamped. I put the BDC on the same switch
as the workstations, so traffic-wise, it should be getting the requests
sooner than the PDC. Is there a way of increasing the possibility of the
BDC servicing a logon request?
If I know BDCs are what I need, I'll set up more. Any advice on
estimating how many, and CPU and memory requirements would be very welcome.
In the meantime, I've adopted a Plan B: I've re-enabled caching
profiles. This has solved the worst of the problem, because after a user
has logged on once, the profile doesn't need to be transfered again, and
all the PDC needs to do is authenticate. But this is not a solution for
workstations with small drives, and I'm going to need BDCs to balance
Or maybe I need a Plan C: Maybe the real problem is that the network is
so busy that connections are timing out. Then Samba opens a new
connection. The new connection times out, so Samba opens a new one, etc,
etc. If it's a network traffic problem, then no matter how many BDCs I
have pushing out profiles, the network will still be overloaded, and the
connections will still time out, and all BDCs will have the same problem
as my PDC. Should I just be getting bigger hard drives for the
workstations so that cached profiles never get too big? Or get a fibre
backbone? If each profile is 810KB, then 10 are about 8MB, or 80Mbits.
That should take 1 second across a 100Mbps LAN. (Although I can see it
takes a lot more than a second for Windows to load a remote profile.) Is
that long enough to time a Samba connection out? I know this is not a
forum for network-related issues, but if you have experience in this
area, I'd really appreciate the advice.
More information about the samba