NT4 IP stack failure under heavy load when using smbfs
Landkreis Unterallgäu - EDV
edv at lra.unterallgaeu.de
Fri Nov 23 01:51:02 GMT 2001
i try to use samba as backup solution for our NT4 servers. At 03:05 every
night we automatically shut down the servers, boot DOS and create with
Symantec Ghost offline a disaster recovery image of the NT4 SCSI disks
(NTFS) on a separate DOS IDE disk (fat32). The down (aka DOS) time is about
15-20 minutes. Then the machine boots with NT4 again. The image (one or two
files with max 2G each) is now located on the fat32 IDE disk. To access the
fat32 filesystem under NT4 we use the winternals fat32 for nt driver. Up to
this point everything works fine.
The next step is to transfer the images to our Linux server. This server has
320G IDE disks, enough for a lot of backup generations and here i have
22h/day time so store my files on tape without any network load in the
office hours. First i tried to transfer the files with rsync. Rsync is to
slow and the cygwin port to nt is a bit difficult to run reliable. Then i
tried unison. Unison is to slow and to buggy. Then i tried samba (smbfs). It
is a fast and reliable solution. It seemed to be ...
I decided to test the transfer with eight NT servers. I created a script
with a loop that mounts (with smbfs) a share backup$ (the IDE disk) on
server1 to /mnt, copies the file to the local machine, unmounts the share,
mounts the share backup$ on server2 and so on up to server8. A first test
was positive, so i decided to run this script as a cron job.
Next morning i came to the office and my NT users on one server could not
log in. I could not explain this, examined the problem an found there was
absolutely NOTHING. No messages, no errors, NOTHING. It was simply the same
effect as if i unplug the network cable from the server. I rebooted the
server and everything worked again.
Next morning i had the same problem with the same server. I decided to
update the network drivers on all 8 servers.
Next morning (today) i had the same problem, this time with another server.
I did some tests with our HP Procurve switches, they work correctly. I tried
excessively to copy large (2G) files between the NT servers, everything
fine. Then i tried to copy such files from a mounted NT share to the linux
box and evey 10th try the IP stack or whatever on the NT box crashed
completely without any errors or messages or something else. The box behaves
like unpluged from the network.
I have no idea why this happens except that NT is from microsoft and intel
Do you have a better idea?
The NT4 servers are 1U rackmount equipped with Asus CUR-DLS mainboard, dual
PIII866, 512MB RAM, ICP-Vortex raid controller. I use the onboard ethernet
controller (Intel server adapter) with the latest drivers from Intel. OS is
More information about the samba