[Samba] Tracking down random server drops...

Josh Coombs jcoombs at staff.gwi.net
Thu Aug 26 08:56:33 MDT 2010


I'm turning my logging up to lvl 3, it looks like I'm seeing the symptom,
not the cause the more I look at this.  Based on last night's failures I
have the following pattern:

At 9:24pm ntbackup on ravenclaw begins writing to the existing file
RAVECLAW.bkf, so far nothing logged at lvl 2 yet...

at 7:04am log.ravenclaw shows:
[2010/08/26 07:04:35.538218,  2] smbd/sesssetup.c:1390(setup_new_vc_session)
  setup_new_vc_session: New VC == 0, if NT4.x compatible we would close all
old resources.
[2010/08/26 07:04:35.576723,  2] smbd/sesssetup.c:1390(setup_new_vc_session)
  setup_new_vc_session: New VC == 0, if NT4.x compatible we would close all
old resources.
[2010/08/26 07:04:35.697538,  2] auth/auth.c:304(check_ntlm_password)
  check_ntlm_password:  authentication for user [backup] -> [backup] ->
[backup] succeeded
[2010/08/26 07:04:35.700101,  1] smbd/service.c:1070(make_connection_snum)
  ravenclaw (192.168.1.8) connect to service backups initially as user
backup (uid=34, gid=34) (pid 30350)
[2010/08/26 07:04:35.701039,  2] smbd/open.c:631(open_file)
  backup opened file A7B4G5J1.BJ3 read=Yes write=Yes (numopen=1)
[2010/08/26 07:04:35.702556,  2] smbd/close.c:656(close_normal_file)
  backup closed file A7B4G5J1.BJ3 (numopen=0) NT_STATUS_OK

The open and closed chatter for that file repeats until 7:06:47am...
[2010/08/26 07:06:36.254759,  2] smbd/close.c:656(close_normal_file)
  backup closed file A7B4G5J1.BJ3 (numopen=0) NT_STATUS_OK
[2010/08/26 07:06:47.640491,  1] smbd/service.c:1251(close_cnum)
  ravenclaw (192.168.1.8) closed connection to service backups
[2010/08/26 07:08:12.795026,  0] lib/util_sock.c:675(write_data)
[2010/08/26 07:08:12.795070,  0]
lib/util_sock.c:1432(get_peer_addr_internal)
  getpeername failed. Error was Transport endpoint is not connected
  write_data: write failure in writing to client 0.0.0.0. Error Connection
reset by peer
[2010/08/26 07:08:12.814075,  0] smbd/process.c:79(srv_send_smb)
  Error writing 51 bytes to client. -1. (Transport endpoint is not
connected)
[2010/08/26 07:08:12.816315,  0] lib/util_sock.c:675(write_data)
[2010/08/26 07:08:12.816325,  0]
lib/util_sock.c:1432(get_peer_addr_internal)
  getpeername failed. Error was Transport endpoint is not connected
  write_data: write failure in writing to client 0.0.0.0. Error Broken pipe
[2010/08/26 07:08:12.816333,  0] smbd/process.c:79(srv_send_smb)
  Error writing 55 bytes to client. -1. (Transport endpoint is not
connected)
[2010/08/26 07:08:12.817211,  0] lib/util_sock.c:675(write_data)
[2010/08/26 07:08:12.817219,  0]
lib/util_sock.c:1432(get_peer_addr_internal)
  getpeername failed. Error was Transport endpoint is not connected
  write_data: write failure in writing to client 0.0.0.0. Error Broken pipe
[2010/08/26 07:08:12.817226,  2] smbd/process.c:2202(keepalive_fn)
  Keepalive failed - exiting.
[2010/08/26 07:08:12.817339,  0] lib/util_sock.c:675(write_data)
[2010/08/26 07:08:12.817348,  0]
lib/util_sock.c:1432(get_peer_addr_internal)
  getpeername failed. Error was Transport endpoint is not connected
  write_data: write failure in writing to client 0.0.0.0. Error Broken pipe
[2010/08/26 07:08:12.817355,  0] smbd/process.c:79(srv_send_smb)
  Error writing 51 bytes to client. -1. (Transport endpoint is not
connected)
[2010/08/26 07:08:12.941965,  2] lib/messages_local.c:289(message_notify)
  message to process 30350 failed - No such process
[2010/08/26 07:08:12.941991,  2]
lib/messages_local.c:379(messaging_tdb_send)
  pid 30350 doesn't exist - deleting messages record
[2010/08/26 07:08:12.942010,  2] lib/messages_local.c:289(message_notify)
  message to process 30350 failed - No such process
[2010/08/26 07:08:12.942020,  2]
lib/messages_local.c:379(messaging_tdb_send)
  pid 30350 doesn't exist - deleting messages record
[2010/08/26 07:08:12.942037,  2] lib/messages_local.c:289(message_notify)
  message to process 30350 failed - No such process
[2010/08/26 07:08:12.942046,  2]
lib/messages_local.c:379(messaging_tdb_send)
  pid 30350 doesn't exist - deleting messages record
[2010/08/26 07:08:12.942065,  2] lib/messages_local.c:289(message_notify)
  message to process 30350 failed - No such process
[2010/08/26 07:08:12.942075,  2]
lib/messages_local.c:379(messaging_tdb_send)
  pid 30350 doesn't exist - deleting messages record
[2010/08/26 07:08:12.942136,  2] smbd/close.c:656(close_normal_file)
  backup closed file Windows Servers/Ravenclaw/RAVENCLAW.bkf (numopen=0)
NT_STATUS_OK
[2010/08/26 07:08:12.942159,  1] smbd/service.c:1251(close_cnum)
  ravenclaw (192.168.1.8) closed connection to service backups

This is followed by the cleanup messages in log.smbd:
[2010/08/26 07:08:13.139151,  1] smbd/server.c:267(remove_child_pid)
  Scheduled cleanup of brl and lock database after unclean shutdown
[2010/08/26 07:08:33.152490,  1] smbd/server.c:240(cleanup_timeout_fn)
  Cleaning up brl and lock database after unclean shutdown

Ravenclaw reports the backup failed at 7:06am, error writing.  So something
happened between Samba and the 2k3 box at 7:04am that triggered the new
session, but I don't have anything to indicate a reason.  I have eventlog
entries on Ravenclaw from after the fact noting Delayed Write Failed, but
nothing leading up to the problem.

Josh C


More information about the samba mailing list