[Samba] How Samba let us down
Chris de Vidal
cdevidal at yahoo.com
Wed Oct 23 06:15:02 GMT 2002
Before you read this, I want to state (for reasons
listed below) that I don't expect an answer (advice is
welcomed, but please read this email carefully before
answering). I'm sharing this with the community with
the hope that better software results from our sad
I've been using NT for 4 years, Netware and Linux for
3 years, and Samba for almost 2. I work in the IT
department of a medium-sized unit of a global
advertising company. We have a Netware and NT
environment with a bit of Linux.
We installed a 280GB IDE Samba archive server (rare
usage) and a 15GB SCSI Mac/Samba file server (medium
usage). We also use Samba for more menial tasks like
smbmounts and file transfers. We thought we were
comfortable with Samba. We knew we were comfortable
with other types of file servers.
Going from my tired memory:
Athlon MP 1.8GHz (mem=nopentium)
2GB ECC SDRAM
Tyan S2460(I think?)
Antec 450W PS
Lots of cooling
5 IBM DeskStar 120GB drives with 8MB caches in RAID 5
3ware 7580(I think?) 8-port hardware RAID
3ware hot-swappable drive cages
Intel e1000 Gigabit NIC, full duplex, 1000MBit,
3com Gigabit switch, autonegotiation off
Kernel 2.4.19 with ACL support
ext3 with ACL support
Samba 2.2.5 with ACL support installed from a
recompiled SRPM from the samba.org FTP site.
NO nfs daemon (I hear it's buggy w/ ACLs)
We have a variety of clients, from DOS and OS/2 to
Windows (9x-2000) and Linux. The server acts as a
print spooling area (the actual queues are on an NT
server) and scratch area for database programmers to
manipulate their flat database files. As far as I
know, these files are not commonly accessed by more
than one user at a time.
For the past year, our heaviest-used Netware server
has been under more and more stress.. filling up,
running out of licenses, slowing down, etc.
Preliminary tests using Samba on a fast Linux box
showed anywhere from 70% to 1000% speed improvements,
depending on the task. The decision was made to
switch it to Linux; the whole company is migrating
away from Netware and we (as a unit, not speaking for
the company) don't want to be completely trapped into
Windows if we can help it.
The new hardware arrived and more preliminary tests
indicated all looked good. We were set to switch last
Saturday night. We turned off logins to the Netware
box, backed it up, restored it to the new Linux box,
set permissions, then made sure the various computers
in the building could log in.
Yesterday, our first day, was rough. For most of the
day we fought random slow browsing with no
explanation. Clients would appear to lock up for
several seconds. We found some misconfigurations in
smb.conf but the problems reappeared. No errors were
seen in any machines' logs on debug level 2. I
trimmed the smb.conf to a minimal number of options
and that seemed to help with the slowness. Today,
however, the problem reappeared a few times with no
errors in the logs that we could see.
The printers were missing some of the records sent to
them to print, something that had never happened with
Netware. Every time the missing records were
different. Occasionally, it would work right.
Oplocks (kernel, level I and II) were left to defaults
Sadly, tonight we are installing a Windows NT server.
Installing a brand new server is actually cheaper for
us than the 8 or so hours of downtime to back up the
server, install NT on it, and restore the data to it.
We don't want to revert to Netware because so many
clients have been reconfigured to log on only to the
domain (DOS, OS/2, etc.) and that would require many
more hours reversing those changes. Also, some files
have been added since leaving Netware. We also
decided to proceed to use NT because is more proven in
To be fair, the problems could be related to some
misconfiguration. I have pasted the smb.conf below.
I fear it might just be an oplock problem, but it is
not clear what would result if more than one user
happened to try to write to a file with them disabled.
Every advice we found said to leave them on to
prevent corruption and to improve performance. We ran
out of time to test it, and feared what failure would
bring. Running this:
grep -r -B5 -A5 oplock /var/log/samba/ | grep -B5 -A5
produced only 5 of these errors
oplock_break: receive_smb error (Connection reset by
from the same DOS machine from 2 days worth of all
machines' logs running at debuglevel 1 (some at level
2). I don't know if that is a good indicator of an
oplock problem. I can do other greps on request.
Unfortunately, we can't test out your suggestions in
production, and our off-production testing apparently
can't stress it well enough. So please just take this
email as input - I'm not looking for answers here,
though advice is appreciated.
The problem could also have been environment or
hardware. We should know soon, as we are going to
reinstall the original Samba server with NT, and the
problems should reappear if hardware or environment.
If we do find that to be true, I will certainly reveal
our findings to this mailing list.
And perhaps the problem was with ACLs. We couldn't
turn them off in production to test that theory.
It is likely that we will try Samba in this capacity
again in the future with a more mature version.
Thanks for listening,
server string =
workgroup = <our domain>
password server = <our PDC>
security = domain
encrypt passwords = yes
smb passwd file =
veto files = /lost+found/
winbind uid = 10000-20000
winbind gid = 10000-20000
winbind separator = +
create mask = 660
force create mode = 660
directory mask = 0770
force directory mode = 0770
log file =
debuglevel = 2
path = /share/print
writeable = yes
Do you Yahoo!?
Y! Web Hosting - Let the expert host your web site
More information about the samba