[Samba] [Very Strange] Windows Networking suddendly stopped working

Carsten Menke bootsy52 at gmx.net
Fri Nov 12 05:32:46 GMT 2004


Hi list,

I hope that maybe one of you can shed some light on this, as this is a very 
strange case and I don't even
have the slightest clue from what this symptom may all come from. Maybe not 
Samba, maybe Hardware, maybe buggy
windows .....

Problem:

We have running  a Samba 3.0.7 (from backports.org) Debian 3.0 STABLE Server here
for over a year now. On Tuesday this week we got the first report from 1 user,
that the Windows Network Neighborhood is not accessable, we did not thought of a 
problem
first.  On Wednesday the next 2 reports from different users on different 
machines came in. And finally
yesterday another 5 reports ( All within the same company using the same Server)

NOTE: The error messages following were translated from a german localized 
version of Windows XP Prof.
so they might be called differently in the english version

So we started looking into the Problem, the error message given was, when trying 
to access the Windows Networking Neighborhood

"The network is not existent or was not started".

It turned out that the services "Server Service, Computer Browser Service" were 
not started. Trying to start them manually
ended with a timeout. The Event Log is showing nothing.

And now the Fun begins ....

We thought first of a network problem and to isolate this we directly connected 
1 PC via a crosslink
cable to the Samba Server, giving the PC  a static IP address. (Normally we use 
DHCP). Even
now the error message was the same, more strangely, if you pulled out the 
network cable completley
the Computer and all Services started normally. Pluggin the cable back in, the 
same problems arose.

Unfortunatley enough is, that sometimes it works *with* the network cable 
plugged in
but then a minute later it doesn't.

So in the next step we replaced the NIC of the Server with a new one, thinking 
we solved the problem
(While replacing the NIC requries a complete reboot this step was done with this 
as well, a tdbbackup -v *.tdb was
also done, showing everything is ok, and I removed manually the browse.dat ).
The first try succeeded, but the second try was again the same result.

And we found out that this problem is bound to the computer not the user, as the 
user can logon on another computer
normal without any problems. All computers were running Windows XP Prof SP1a.
Altough there is a virus Scanner (CA Etrust Inoculan) with up-to-date signatures 
on the computer, we scanned the computers in question
by 2 additional Anti Virus Packages, 1. H+BEDV AntiVir and 2. Kasperksy, all 
scanners marking the computer clean.
Running "nbtstat -RR" didn't also not solve the problem.

Next, our MCSE decided to install Windows XP SP2 on the computers in question, 
and gues what, that solved the problem so far.

So my question is what is the **REAL** Problem we seeing here, I don't believe 
that the solution is SP2, and normally I wouldn't worry
if it would be 1 Computer showing this odd behavior, but the number increasing 
of the computers showing the same sympthoms within
3 days does make me nervous.


I have looked thru samba log files and they were showing the things below  and also
rather frequently "No route to host" here is the output of them. There is no 
router between
the PC's and the server, they are connected via a 3Com Super Stack III Switch.

Logfile:

[2004/11/12 02:34:14, 0] lib/util_sock.c:get_peer_addr(1000)
   getpeername failed. Error was Transport endpoint is not connected
[2004/11/12 02:34:14, 0] lib/util_sock.c:write_socket_data(430)
   write_socket_data: write failure. Error = Connection reset by peer
[2004/11/12 02:34:14, 0] lib/util_sock.c:send_smb(647)
   Error writing 4 bytes to client. -1. (Connection reset by peer)
[2004/11/12 02:34:14, 0] smbd/service.c:make_connection(800)
   neckar (192.168.1.65) couldn't find service user
[2004/11/12 02:38:40, 0] rpc_server/srv_util.c:get_alias_user_groups(219)
   get_alias_user_groups: gid of user xxx doesn't exist. Check your /etc/passwd 
and /etc/group files

I double checked the above message, the gid *is* in /etc/group and the userid 
does also exist.


I'm glad for every hint etc. etc. I could get, as it seems that this could turn 
out to be a real problem


Regards

Carsten


More information about the samba mailing list