NT Domain subnetting problems

Fri Mar 23 22:55:10 GMT 2001

To all,

We are not sure if this is the correct place to send this but we are also
unsure whether it is a bug or some other strange behaviour. Our apologies
in advance if this should be sent elsewhere.

We have just implemented a SAMBA server base on a SUN 420R with dual CPUs,
1 Gbyte RAM connected to an FCAL disk array of 1 Tbyte with dual Qlogic
fibre connects. We are having inconsistent failures and/or performance
issues with a subnet of NT/Win 2K machines connected to the server. There
are five subnets with machines of differing performance on each - all
subnets are 100Mbit. The PDC/WINS server is an NT box on the same subnet as
the SAMBA server. The browsing is all setup OK and the machines can all be
seen in the network neighbourhood and can access the SAMBA shares from the
network. Pings etc work fine.

However, the response and/or access from machines on different subnets is
inconsistent. For example:

1) A machine on one subnet can copy from SAMBA share to SAMBA share with
reasonable performance and consistently. A copy of about 80Mbytes takes
approx 2 minutes.
2) A machine on another subnet does the same copy and it works fine and in
about the same time. Try it a second time and all of a sudden the time
blows out to 30 minutes and the copy fails and the connection is lost. A
look at the log (machine specific) shows (sometimes!) oplock problems which
from the FAQ indicates a "broken" network card/poor cabling doesn't it?
However, on other occasions the log indicates a different set of errors
(see below):

  roger (172.16.12.18) connect to service testarea as user kim (uid=0,
gid=300) (pid 16808)
[2001/03/23 11:04:34, 0, pid=16808, effective(0, 0), real(0, 0)]
smbd/service.c:(336)
  kim logged in as admin user (root privileges)
[2001/03/23 11:04:34, 1, pid=16808, effective(0, 300), real(0, 0)]
smbd/service.c:(550)
  roger (172.16.12.18) connect to service test4kim as user kim (uid=0,
gid=300) (pid 16808)
[2001/03/23 11:05:42, 0, pid=16808, effective(0, 300), real(0, 0)]
smbd/oplock.c:(1204)
  request_oplock_break: no response received to oplock break request to pid
16749 on port 33722 for dev = 2680062, inode = 486401
  for dev = 2680062, inode = 486401, tv_sec = 3aba92cc, tv_usec = 22e4a
[2001/03/23 11:06:14, 0, pid=16808, effective(0, 300), real(0, 0)]
smbd/oplock.c:(1204)
  request_oplock_break: no response received to oplock break request to pid
16749 on port 33722 for dev = 2680062, inode = 486401
  for dev = 2680062, inode = 486401, tv_sec = 3aba92cc, tv_usec = 22e4a
[2001/03/23 11:06:43, 0, pid=16825, effective(0, 0), real(0, 0)]
smbd/service.c:(336)
.
.
.
[2001/03/23 11:15:13, 2, pid=16749, effective(0, 0), real(0, 0)]
smbd/close.c:(159)
  kim closed file Kims Foilder/dec/avhrrpf.ch1.1nmfgl.8712.gz (numopen=1) 
[2001/03/23 11:15:13, 1, pid=16749, effective(0, 0), real(0, 0)]
smbd/service.c:(583)
  roger (172.16.12.18) closed connection to service testarea
[2001/03/23 11:15:14, 0, pid=16808, effective(0, 300), real(0, 0)]
lib/util_sock.c:(540)
  write_socket_data: write failure. Error = Broken pipe
[2001/03/23 11:15:14, 0, pid=16808, effective(0, 300), real(0, 0)]
lib/util_sock.c:(566)
  write_socket: Error writing 102 bytes to socket 8: ERRNO = Broken pipe
[2001/03/23 11:15:14, 0, pid=16808, effective(0, 300), real(0, 0)]
lib/util_sock.c:(754)
  Error writing 102 bytes to client. -1. Exiting
[2001/03/23 11:19:03, 2, pid=16837, effective(0, 300), real(0, 0)]
smbd/dosmode.c:(61)

3) Machines on the same subnet as the SAMBA server generally have no
problems and are much quicker copying. But not always!

Routing seems fine as the machines can communicate with each other - ping,
ftp and telnet between the subnets works fine to the machines and server
etc etc. Routing tables are fine and we have traced the routes between
subnets and all is OK. When something goes wrong the client machine
freezes, the SAMBA connection "appears" to be lost and the copy fails.
However, checking the SAMBA status shows that the session for the machine
is still connected, as are the shares. After multiple failures there can be
a number of sessions connected to the SAMBA machine which causes additional
problems to appear.

A section of the smb.conf file is included with the global definitions and
the definitions for two test shares:

# Samba config file created using SWAT
# from montezuma (172.16.16.11)
# Date: 2001/03/23 10:45:47

# Global parameters
[global]
	workgroup = AGRECON
	server string = Caesar Samba Server
	security = SERVER
	password server = 172.16.16.11
	debug level = 2
	log file = /var/opt/samba/log.%m
	max log size = 10000
	debug pid = Yes
	debug uid = Yes
	name resolve order = wins host  lmhosts bcast
	dns proxy = No
	wins server = 172.16.16.11
	invalid users = root bin daemon adm sync shutdown halt mail news uucp
operator gopher
	admin users = kim
	level2 oplocks = Yes
.
.
.
[test4kim]
	path = /users/kim
	valid users = kim
	writeable = Yes
	create mask = 0755
	inherit permissions = Yes

[testarea]
	path = /tmp/testarea
	valid users = kim
	writeable = Yes
	create mask = 0755
	inherit permissions = Yes

Any suggestion on possible problem areas and/or solutions would be
gratefully received.

Thank you in advance.

Kim Malafant
Director, compleXia
PO Box 3011, Belconnen
ACT, Australia, 2617
Phone: (02) 6253 8342
Fax:   (02) 6253 8346