Samba hanging up after about 15 days

Stephen Eickhoff operagost at email.com
Mon Oct 27 20:23:23 GMT 2003


I apologize for the wide margins, but I have some system output I need to post.
I wonder if anyone has had Samba VMS on a VAX hang up on them after about two weeks.
This process is causing BACKUP to hang, and any process that tries to run TCPIP hangs
as well.

The system in question is a Vaxstation 4000/60 with 32 MB RAM running VMS 7.1 and
TCPIP 5.1, eco 5.

NMBD is going into MUTEX and hanging up my system so badly that I have to reboot it. 
Take a look at SHOW SYSTEM:

OpenVMS V7.1  on node ORFF  27-OCT-2003 15:02:53.91  Uptime  16 16:48:37
  Pid    Process Name    State  Pri      I/O       CPU       Page flts  Pages
20200081 SWAPPER         HIB     16        0   0 00:00:44.72         0      0
20200086 CONFIGURE       HIB     10        6   0 00:00:03.62      6644    167
20200088 IPCACP          HIB     10        6   0 00:00:00.13      6019    101
20200089 ERRFMT          HIB      8    11036   0 00:00:34.25      1784    119
2020008A CACHE_SERVER    HIBO    16       --  swapped  out  --            121
2020008B CLUSTER_SERVER  HIB      8       11   0 00:00:00.05       192    281
2020008C OPCOM           HIB      7    22437   0 00:00:47.87      6013    169
2020008D AUDIT_SERVER    HIB     10    44564   0 00:00:53.23      3290    397
2020008E JOB_CONTROL     HIB     10    28477   0 00:00:41.15     10181    163
2020008F QUEUE_MANAGER   HIB      9     8537   0 00:00:43.73     13399    549
20200090 SECURITY_SERVER HIB     10     7540   0 00:02:08.84    131238    603
20200091 SMISERVER       HIB      9       35   0 00:00:00.66      7951     71
20200092 TP_SERVER       HIB      9    96640   0 00:08:43.59     36033    158
20200093 TCPIP$TNS2      HIBO     4       --  swapped  out  --            381
20200094 TCPIP$TNS1      HIBO     4       --  swapped  out  --            399
20200095 TCPIP$INETACP   HIB      8    25761   0 00:01:03.77     16604    589
20200096 TCPIP$BIND_1    LEF      9  1182971   0 00:48:14.88    189439   1813  N
20200097 TCPIP$PORTM_1   LEF     10      110   0 00:00:00.74      7484     59  N
20200098 TCPIP$FTP_1     LEF     10      207   0 00:00:01.29      8642   1172  N
20200099 TCPIP$LBROKER_1 LEF      9  3381919   0 00:56:27.58    203993    691  N
2020009A TCPIP$METRIC_1  LEF     10   556766   0 00:13:09.13     43868    183  N
2020009B TCPIP$NFS_1     HIB      8      152   0 00:00:21.60     12167     59  N
2020009C TCPIP$MOUNTD_1  LEF     10      240   0 00:00:01.36      7027     65  N
2020009D TCPIP$NTP_1     LEF      9  1481698   0 00:02:06.88    101058    339  N
2020009F TCPIP$POP_1     HIB     10    25064   0 00:01:55.96     25131   1207  N
202000A0 SMTP_ORFF_01    HIB      6    20353   0 00:02:13.01     28122   2003
202000A3 TNT_SERVER      HIB      6    10137   0 00:09:12.23    212289   1268
20204824 SMBD_BG1152     RWAST    8      259   0 00:00:01.98      2635   2660  N
202000A5 CircleMUD       LEF      6   192364   0 00:01:37.58     24704    489
202000A6 NMBD            MUTEX    9   690146   0 03:31:10.51    117217    744
20203C27 ZAP_BRANAGEN    LEF      8     5674   0 00:00:10.00      2477    616
202000AB TNT1202000A3    LEFO     1       --  swapped  out  --            495  S
202000AF SYSTEM          LEF      5     2715   0 00:00:29.42     30293    268
202049B0 CYRIL           LEF      5    35421   0 00:04:36.08      2859   1940
20204A31 OPERAGOST       RWAST    6      579   0 00:00:01.80      1289    329
20203EB2 XOO6            LEF      9    11859   0 00:00:45.03      4273   1742
20204833 SWAT_BG1177     RWAST    6      135   0 00:00:01.62      2232   1892  N
20204934 _VTA1256:       CUR      4      821   0 00:00:04.02      3887    322
20204A39 TCPIP$SM_BG3533 LEF      8      143   0 00:00:01.57      2328   1517  N
20204743 BATCH_553       LEF      6     1360   0 00:00:05.19      1391   1288  B

Action taken: first, Samba wasn't responing do I tried to SWAT in to restart it.
SWAT hung up in the middle of bring up the web page. So I went in and tried to 
kill NMBD. Of course this didn't work. I tried killing SWAT and SMBD processes 
in the hope that would free up something. They just went into RWAST. I tried to
run TCPIP so I could disable SAMBA, but it hung up before giving a prompt, 
putting that process into RWAST as well.

Here's what NMBD looks like with SHOW PROCESS in SDA:

Process index: 0026   Name: NMBD   Extended PID: 202000A6
---------------------------------------------------------
Status : 00140023 res,delpen,respen,phdres,login
Status2: 00000001 quantum_resched
PCB address              81EE6B40    JIB address              81E86600
PHD address              83639800    Swapfile disk address    00000000
Master internal PID      00010026    Subprocess count                0
Internal PID             00010026    Creator internal PID     00000000
Extended PID             202000A6    Creator extended PID     00000000
State                       MUTEX    Termination mailbox          0000
Current priority                9    AST's enabled                KESU
Base priority                   4    AST's active                 ES
UIC                [00001,000004]    AST's remaining                21
Mutex count                     0    Buffered I/O count/limit       16/18
Waiting EF cluster              0    Direct I/O count/limit         17/18
Starting wait time       1B011B1B    BUFIO byte count/limit        128/896
Event flag wait mask     81E86600    # open files allowed left      10
Local EF cluster 0       C0000001    Timer entries allowed left      8
Local EF cluster 1       80000000    Active page table count         0
Global cluster 2 pointer 00000000    Process WS page count         560
Global cluster 3 pointer 00000000    Global WS page count          184

and SHOW SYSTEM /CHANNEL:

Process index: 0026   Name: NMBD   Extended PID: 202000A6
---------------------------------------------------------

%SDA-W-NOACCESS, process not accessible (swapped out or suspended)


                            Process active channels
                            -----------------------

Channel  Window           Status        Device/file accessed
-------  ------           ------        --------------------
  0010  00000000                        DKA0:
%SDA-E-NOREAD, unable to access location 8363B7EC


Here's SHOW PROCESS for the SMDB process that was left running:

Process index: 0024   Name: SMBD_BG1152   Extended PID: 20204824
----------------------------------------------------------------
Status : 00240023 res,delpen,respen,phdres,netwrk
Status2: 00000001 quantum_resched
PCB address              81EF9100    JIB address              81EAC700
PHD address              8368B800    Swapfile disk address    00000000
Master internal PID      00900024    Subprocess count                0
Internal PID             00900024    Creator internal PID     00000000
Extended PID             20204824    Creator extended PID     00000000
State                       RWAST    Termination mailbox          0013
Current priority                8    AST's enabled                KESU
Base priority                   6    AST's active                 S
UIC                [00001,000004]    AST's remaining              4195
Mutex count                     0    Buffered I/O count/limit      511/512
Waiting EF cluster              0    Direct I/O count/limit       4094/4096
Starting wait time       1B011919    BUFIO byte count/limit     ******/2046848
Event flag wait mask     00000001    # open files allowed left     294
Local EF cluster 0       80000000    Timer entries allowed left     30
Local EF cluster 1       80000000    Active page table count         0
Global cluster 2 pointer 00000000    Process WS page count        2321
Global cluster 3 pointer 00000000    Global WS page count          339

WHAT'S UP WITH THE ASTERISKS IN BUFIO?

And SHOW PROCESS /CHANNEL:


                            Process active channels
                            -----------------------

Channel  Window           Status        Device/file accessed
-------  ------           ------        --------------------
  0010  00000000                        DKA0:
  0020  81DD4FC0                        DKA0:[SAMBA.BIN]SMBD.EXE;1
  0030  81DCB700                        DKA0:[VMS$COMMON.SYSLIB]SECURESHRP.EXE;1
 (section file)
  0040  81DCE080                        DKA0:[VMS$COMMON.SYSLIB]SECURESHR.EXE;1
(section file)
  0050  81DD06C0                        DKA0:[VMS$COMMON.SYSLIB]LIBRTL.EXE;1 (section file)
  0060  81DC8940                        DKA0:[VMS$COMMON.SYSEXE]DCL.EXE;1 (section file)
  0070  81DC5040                        DKA0:[VMS$COMMON.SYSLIB]UVMTHRTL.EXE;1 (section file)
  0080  81DE1340                        DKA0:[VMS$COMMON.SYSLIB]DCLTABLES.EXE;81
 (section file)
  0090  81F0F600             Busy       DKA0:[SYS0.SYSMGR]SMBD_STARTUP.LOG;210
  00A0  81E9B300                        DKA0:[SAMBA.BIN]SMBD_STARTUP.COM;7
  00B0  81DD1780                        DKA0:[VMS$COMMON.SYSLIB]DECC$SHR.EXE;3 (section file)
  00C0  81DD1980                        DKA0:[VMS$COMMON.SYSLIB]CMA$TIS_SHR.EXE;1 (section file)
  00D0  81DD1740                        DKA0:[VMS$COMMON.SYSLIB]UCX$IPC_SHR.EXE;1 (section file)
  00E0  81DCFF40                        DKA0:[VMS$COMMON.SYSLIB]TCPIP$ACCESS_SHR.EXE;1 (section file)

Process index: 0024   Name: SMBD_BG1152   Extended PID: 20204824
----------------------------------------------------------------

Channel  Window           Status        Device/file accessed
-------  ------           ------        --------------------
  00F0  00000000                        BG1152:
  0100  81E8F540                        DKA0:[VMS$COMMON.SYSEXE]RIGHTSLIST.DAT;1
  0110  81E5C780                        DKA0:[SAMBA.PRIVATE]SECRETS.TDB;1
  0120  81E9B480                        DKA0:[SAMBA]LOG.SMBD;1
  0130  00000000             Busy       DKA0:


DKA0: is my system disk, so dismounting it just to free this process isn't an option!
I don't get it- I assume the NMBD is what's hung up, but I don't know what it means
by "%SDA-E-NOREAD, unable to access location 8363B7EC" and it doesn't show anything busy!
It also doesn't look like either process has exhausted any quota.

----------------------------------
        Stephen Eickhoff
      operagost at email.com
----------------------------------





More information about the samba-vms mailing list