files locked forever -- logs! :)
syz at dds.nl
Thu Jun 21 18:29:38 GMT 2001
David Lee wrote:
-- snip --
> It does indeed seem to be trying to grab the oplock from another process,
> in this case 18071 .
> When this fault strikes, check whether that process (whatever its number)
> really exists. (Indeed, if you have the log files from the above incident
> still lying around, you might even be able to trace that incident.)
Yeah I still have the 1,5Gb log, it however takes some time for 'grep' to find
something :)). Altough I've now dd'd them in 3 parts :).
> See if there is a set of messages such as:
> [2001/06/18 14:16:23, 0] ../lib/fault.c:fault_report(40)
> [2001/06/18 14:16:23, 0] ../lib/fault.c:fault_report(41)
> INTERNAL ERROR: Signal 11 in pid 26673 (2.2.0)
> Please read the file BUGS.txt in the distribution
> [2001/06/18 14:16:23, 0] ../lib/fault.c:fault_report(43)
> [2001/06/18 14:16:23, 0] ../lib/util.c:smb_panic(1139)
> PANIC: internal error
I couldn't find anything like this, the last stuff I see in the logs about pid 18071
(except the failed oplock_break msgs send to the process) is:
[2001/06/21 12:34:39, 5] lib/util.c:show_msg(303)
[2001/06/21 12:34:39, 5] lib/util.c:show_msg(308)
[2001/06/21 12:34:39, 3] smbd/process.c:switch_message(650)
switch message SMBreadX (pid 18071)
[2001/06/21 12:34:39, 4] smbd/uid.c:become_user(114)
Skipping become_user - already user
[2001/06/21 12:34:39, 10] smbd/fileio.c:seek_file(63)
seek_file: requested pos = 3307008, new pos = 3307008
[2001/06/21 12:34:39, 10] lib/util_sock.c:read_smb_length_return_keepalive(602)
got smb length of 146
-- etc --
Nothing special (after that), no errors or whatever :/.
I've restarted smb as soon as apps where locked, so I can't also trace back
if pid 18071 was still alive.
> In my case this was pid 26673 (your 18071) reporting its own untimely
> demise. As a result of this, I see other processes getting stuck when
> trying to grab oplocks from that (now absent) process.
I see, maybe it should then just remove the oplock? Ofcourse, only if it's
an exclusive oplock (so only the client [who just died] had that file locked
and it can be safely removed). Also, I understand the _real_ source of the
problem (some smbd thread which crashed) should be corrected too, but
double protection against this is always nice in my opinion :).
> I understand (see other thread running on this list for last couple of
> days) that there is a known problem of this nature in 2.2.0, which is
> apparently corrected in the forthcoming release.
Sounds good :).
> Apparently, a process holding an oplock goes away unexpectedly: it leaves
> behind the "======" log, but because of the nature of the exit, cannot
> clear its oplocks, which are left trailing. Future processes don't
> detect this combination of circumstances (oplocks from deceased process)
> and themselves trip over...
> > I'm looking in the samba source code
> > for just two days or something.
> See if it classifies as above. If so, your choices are probably:
> 1. backtrack to 2.0.7 (ideally 2.0.9);
> 2. try to live with and firefight the problem for now (what we are doing);
> basically it means re-starting samba on your server.
> 3. (not recommended unless totally expert): update your source from CVS.
I added 'oplocks = no' to smb.conf, it may be a a bit (I heard 30% avg?) slower
than with oplocks, but untill it's fixed it's ok with me :)
> > I hope somebody can look at it, since it's a bit difficult for me to
> > trace the error exactly (and to fix it).
> As I say, I'm totally inexpert on oplocks. But it sounds something like a
> "known problem", which may be firefightable until the next release.
Hope so, then I will be able to really enjoy the new samba 2.2 (:
More information about the samba-technical