[Samba] High load average and client timeouts
Progman2000 at usa.net
Sat Jan 10 18:55:55 GMT 2004
-----BEGIN PGP SIGNED MESSAGE-----
I am setting up a proof-of-concept backup server at my office. The
end idea is for a dozen or so of our ~200 workstations to dump images
(like PowerQuest DeployCenter, not JPEG) to a 2Tb RAID5 at reasonable
speeds. The testbed (whose specs are listed below) is, I admit,
grossly lacking in RAM. I still think it should handle at least two
or three systems at once without choking. But on to that....
Most of the machines we've used as test clients can dump to the
system at very high speeds without a problem. CPU utilization hits
~10%, load average between 0.1 and 0.3 (generally). RAM never seems
to move much, it's always about 3mb free with little or no swap used.
When two systems hit it, CPU doesn't change much, but load average
starts climbing. It /may/ stabilize near 0.8. Three most always
pushes it over 1.0. Once that happens, it keeps climbing until the
clients timeout and abort. By then, load average has hit 6-14
(depending on how many machines were transferring at once).
Some systems can (individually) cause this overload. All the clients
are using 100Mbps connections (some switched, some on hubs). No real
pattern has emerged. Image files are broken up every 130mb right
now, and no verification is done until the whole image is written.
Transfers *from* the testbed use almost no CPU time, and the load
average never gets much over 0.2, even with multiple clients.
My only theory so far is that Samba is filling up the write
cache/buffers faster than they can be emptied to the HD. On that
premise, I've tried to speed up the filesystem and slow down Samba a
bit (for instance, by taking "SO_RCVBUF=8192 SO_SNDBUF=8192" out of
the "socket options"). That does not seem to have helped noticeably.
I've managed to snag a tcpdump of one particular client box running
and then aborting, if that would help. It's a 9mb tgz that I can
make available on request.
The Microsofties running the department are pushing for an NT5
"server", so we're comparing this testbed to a little 500Mhz
recycled-workstation (also with 128mb of RAM). On the Linux testbed,
all five clients aborted after the connections stalled. The same
five clients kept going when talking to the NT box (albeit at a
slightly slower speed).
While management doesn't much care about the outcome of this test
(they'd want the NT box if it took two days to write ten bytes), I
want the best system to win. I just can't believe that a Linux/Samba
system is unable to out-perform a runty Windows box.
One nagging question is what would the "real" server's performance
be? We have spec'd dual Athlon MP 2200+ CPUs, a 3ware 7506-12
controller with 12 200gb Western Digital drives, and 4gb of RAM.
(Whole thing is $6,000!!) Thing is, I don't think the RAID would be
much faster (writing) than the existing IDE drive. I'd hate to blow
six grand and find out it doesn't perform any better.
Has anyone dealt with a similar problem before? Did I overlook some
obvious "DontChokeAtHighSpeeds" option?
Linux 2.4.22 (custom)
2.2Ghz Intel Celeron
60gb Maxtor 6Y060L0 on UltraATA/133
128mb RAM, 256mb swap
# Will try to add RAM next week
On-board Intel Pro/1000 (Gigabit) NIC
All partitions are on LVM except swap
Path Type Size Free
/ ext3 6g 5.1g
/files ext2 30g 9g
<Others left off, not relevent>
/etc/samba/smb.conf: (selected highlights)
security = domain
socket options = TCP_NODELAY
max xmit = 8192
comment = Backup test storage
path = /files/Backups
valid users = @OurDomain+Dept_ComputerRes OurDomain+BackMeUp
public = yes
writable = yes
printable = no
create mask = 0775
guest ok = yes
Progman2000 at usa.net
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (MingW32) - GPGshell v2.95
-----END PGP SIGNATURE-----
More information about the samba