[Samba] GFS and samba problem, again

Fri Oct 6 16:21:48 GMT 2006

Hi,

I proved "strace -f -ttT -o /tmp/smbd.out -p <smbd-pid>" to guess 
what's happenning, and it seems that system calls like 
write,open,flock, never finish until samba is restarted.

4665  11:09:31.068381 kill(4666, SIG_0 <unfinished ...>
4665  11:09:31.068750 <... kill resumed> ) = -1 EPERM (Operation not 
permitted) <0.000310>
4665  11:09:31.068996 kill(4665, SIG_0 <unfinished ...>
4665  11:09:31.069260 <... kill resumed> ) = 0 <0.000205>
4665  11:09:31.069458 kill(4667, SIG_0 <unfinished ...>
4665  11:09:31.069617 <... kill resumed> ) = 0 <0.000099>
4665  11:09:31.069781 open("cint95-intel.mtw", O_RDONLY|O_LARGEFILE 
<unfinished ...>
4665  11:09:31.070150 <... open resumed> ) = 22 <0.000293>
4665  11:09:31.070396 geteuid32( <unfinished ...>
4665  11:09:31.070649 <... geteuid32 resumed> ) = 503 <0.000195>
4665  11:09:31.070937 write(19, "prova03 opened file cint95-intel"..., 
67 <unfinished ...>
4665  11:09:31.071282 <... write resumed> ) = 67 <0.000261>
4665  11:09:31.071511 flock(22, 0x60 /* LOCK_??? */ <unfinished ...>
4665  11:09:31.071770 <... flock resumed> ) = 0 <0.000197>
4665  11:09:31.072127 write(5, 
"\0\0\0g\377SMB\242\0\0\0\0\210\1\310\0\0\0\0\0\0\0\0\0"..., 107 
<unfinished ...>
4665  11:09:31.072447 <... write resumed> ) = 107 <0.000212>
.....................................................................
4665  11:09:31.242316 <... geteuid32 resumed> ) = 503 <0.000118>
4665  11:09:31.242405 write(19, "close fd=22 fnum=6371 (numopen=2"..., 
34) = 34 <0.000031>
4665  11:09:31.242572 nanosleep({0, 2000001},  <unfinished ...>
4667  11:09:31.245063 kill(4665, SIG_0) = 0 <0.000018>
4665  11:09:31.248047 <... nanosleep resumed> NULL) = 0 <0.005406>
4665  11:09:31.249355 nanosleep({0, 2000001}, NULL) = 0 <0.002621>
4665  11:09:31.252091 nanosleep({0, 2000001}, NULL) = 0 <0.003853>
4665  11:09:31.256088 nanosleep({0, 2000001}, NULL) = 0 <0.003906>
.................. a lot of nanosleeps ..............................
4665  11:10:04.887037 nanosleep({0, 2000001},  <unfinished ...>
4665  11:10:04.887219 <... nanosleep resumed> 0) = ? 
ERESTART_RESTARTBLOCK (To be restarted) <0.000111>
4665  11:10:04.888197 +++ killed by SIGKILL +++
4667  11:10:04.890712 kill(4665, SIG_0 <unfinished ...>
4666  11:10:04.920965 kill(4665, SIG_0) = -1 ESRCH (No such process) 
<0.000017>
4667  11:10:04.934486 kill(4665, SIG_0 <unfinished ...>

 >BTW, it is a _REALLY_ bad idea to export the same fs via two
 >cluster nodes at the same time with current Samba.

At this time, we aren't exporting the same fs via two cluster nodes 
since samba in node2 is stopped, and the problem remains.
Any help will be appreciated,

Sandra Hernàndez

Volker Lendecke wrote:
> On Wed, Oct 04, 2006 at 02:15:45PM +0200, sandra-llistes wrote:
>> When we try to access from a single windows client it works fine, but 
>> when we try to access to the same file from 2 or more windows clients 
>> simoultaneously, windows hangs and samba also does. This seems not to 
>> happen with concurrent access to different files or with linux clients.
> 
> To really figure out what's going on you need to strace the
> smbd process.
> 
> strace -ttT -o /tmp/smbd.out -p <smbd-pid>
> 
> If you have the hang then wait some seconds, kill the
> appropriate smbd and look at /tmp/smbd.out where the smbd
> has been stuck. 99% it's in a filesystem related call, and
> then it's a GFS problem. I'm pretty sure this is GFS because
> I do not see any reason why Samba itself would behave
> differently when running on two cluster nodes.
> 
> BTW, it is a _REALLY_ bad idea to export the same fs via two
> cluster nodes at the same time with current Samba. It
> _might_ be ok because you have one read only and only one
> r/w. If you had both r/w then data corruption would
> inevitably follow, we're right now working on a cluster
> version of Samba that would allow this properly.
> 
> Volker