[Samba] PDB files and "Delayed Write Failed"

James Chamberlain jamesc at exa.com
Wed Mar 4 16:58:22 GMT 2009


Hello Samba Community,

I have what is probably a very unique problem.  Allow me to explain:

Background:
We build software for Windows, among other things.  Most of our developers 
are not on Windows, but they need to do Windows builds.  To facilitate 
this, we've set up a complex build system where calling "make" 
automatically connects (rsh/ssh) to a cmd shell on the Windows build 
server, translates our Makefile into something more suitable for Windows, 
and executes the build.  The source code is not on the build server's local 
disks, but is instead sitting on a file server which the build server 
accesses through Samba.  This leads to the problem.


The Problem(s):
We're seeing mysterious and unpredictable problems in this environment. 
Looking through the Event Viewer, we've seen 2658 "Delayed Write Failed" 
messages since October.  Only 19 of them did not relate to ".pdb" files. 
The Samba logs don't indicate a problem.

We're getting messages from the compiler that it can't find header files 
which definitely exist and are definitely in the include path.  We're also 
getting the occasional "gmake: *** Makefile: Permission denied.  Stop." 
message.  Simply starting the "make" again without changing any permissions 
allows the build to continue.


Build Server:
* Windows Server 2003 SP2
* 4x 3 GHz Xeon (5160)
* 4 GB RAM
* 2x 10k RPM SAS drives, hardware RAID 1

File Server:
* CentOS 5.2
* 8x 3 GHz Xeon (5450)
* 4 GB RAM
* 14x 15k RPM SAS drives, hardware RAID 6
* Samba 3.0.25b-1.el5_1.4
* Authenticates against Windows domain controller(s)


What I've tried already (not necessarily in this order):
* Rebooted the Build Server.
* Swapped OSs on the Build Server.  We started with NT, then moved to XP
   and are now on Server 2003.
* Swapped Ethernet cable on the Build Server.
* Swapped Ethernet switch port for the Build Server.
* Swapped Ethernet switch for the Build Server.
* Swapped Ethernet NIC on the Build Server.
* Swapped the Build Server hardware itself.
* Switched from explicitly mapping drives at the start of each remote cmd
   session to using UNC paths.

* Swapped OSs on the File Server.  We started with Red Hat Linux 8 for i386
   and have moved up through several iterations to Centos 5.2 for x86_64.
* Swapped Ethernet cable on the File Server.
* Swapped Ethernet switch port for the File Server.
* Swapped Ethernet switch for the File Server.
* Swapped Ethernet NIC on the File Server.
* Swapped the File Server hardware itself.
* Upgraded to the latest version of Samba available from the CentOS team.
   This broke domain authentication for us, so we rolled back to 3.0.25.

* Added a backup domain controller.  (NT4 domain environment still.  Yes,
   I know, I'm working on it)

* Changed the Samba socket options from "TCP_NODELAY SO_RCVBUF=8192
   SO_SNDBUF=8192" to "TCP_NODELAY IPTOS_LOWDELAY".
* Set "large readwrite = no"
* Set "write raw = no"
* Explicitly turned on oplocks and level2 oplocks, though I believe they
   are on by default.
* "dos filetimes = yes"
* "fake directory create times = yes"
* "dos filetime resolution = yes"
* "allocation roundup size = 0"

Thusfar, any time we've managed to improve performance back to the expected 
level, it has been unclear what did the trick... and it didn't last.  If 
anyone has any thoughts on other things I can try, I would certainly 
appreciate it.  If there's any further information that would help in 
making an assessment, I'd be happy to post what I can.

Thanks,

James


More information about the samba mailing list