[Samba] GFS and samba problem, again
sandra-llistes
sandra-llistes at fib.upc.edu
Fri Oct 6 16:21:48 GMT 2006
Hi,
I proved "strace -f -ttT -o /tmp/smbd.out -p <smbd-pid>" to guess
what's happenning, and it seems that system calls like
write,open,flock, never finish until samba is restarted.
4665 11:09:31.068381 kill(4666, SIG_0 <unfinished ...>
4665 11:09:31.068750 <... kill resumed> ) = -1 EPERM (Operation not
permitted) <0.000310>
4665 11:09:31.068996 kill(4665, SIG_0 <unfinished ...>
4665 11:09:31.069260 <... kill resumed> ) = 0 <0.000205>
4665 11:09:31.069458 kill(4667, SIG_0 <unfinished ...>
4665 11:09:31.069617 <... kill resumed> ) = 0 <0.000099>
4665 11:09:31.069781 open("cint95-intel.mtw", O_RDONLY|O_LARGEFILE
<unfinished ...>
4665 11:09:31.070150 <... open resumed> ) = 22 <0.000293>
4665 11:09:31.070396 geteuid32( <unfinished ...>
4665 11:09:31.070649 <... geteuid32 resumed> ) = 503 <0.000195>
4665 11:09:31.070937 write(19, "prova03 opened file cint95-intel"...,
67 <unfinished ...>
4665 11:09:31.071282 <... write resumed> ) = 67 <0.000261>
4665 11:09:31.071511 flock(22, 0x60 /* LOCK_??? */ <unfinished ...>
4665 11:09:31.071770 <... flock resumed> ) = 0 <0.000197>
4665 11:09:31.072127 write(5,
"\0\0\0g\377SMB\242\0\0\0\0\210\1\310\0\0\0\0\0\0\0\0\0"..., 107
<unfinished ...>
4665 11:09:31.072447 <... write resumed> ) = 107 <0.000212>
.....................................................................
4665 11:09:31.242316 <... geteuid32 resumed> ) = 503 <0.000118>
4665 11:09:31.242405 write(19, "close fd=22 fnum=6371 (numopen=2"...,
34) = 34 <0.000031>
4665 11:09:31.242572 nanosleep({0, 2000001}, <unfinished ...>
4667 11:09:31.245063 kill(4665, SIG_0) = 0 <0.000018>
4665 11:09:31.248047 <... nanosleep resumed> NULL) = 0 <0.005406>
4665 11:09:31.249355 nanosleep({0, 2000001}, NULL) = 0 <0.002621>
4665 11:09:31.252091 nanosleep({0, 2000001}, NULL) = 0 <0.003853>
4665 11:09:31.256088 nanosleep({0, 2000001}, NULL) = 0 <0.003906>
.................. a lot of nanosleeps ..............................
4665 11:10:04.887037 nanosleep({0, 2000001}, <unfinished ...>
4665 11:10:04.887219 <... nanosleep resumed> 0) = ?
ERESTART_RESTARTBLOCK (To be restarted) <0.000111>
4665 11:10:04.888197 +++ killed by SIGKILL +++
4667 11:10:04.890712 kill(4665, SIG_0 <unfinished ...>
4666 11:10:04.920965 kill(4665, SIG_0) = -1 ESRCH (No such process)
<0.000017>
4667 11:10:04.934486 kill(4665, SIG_0 <unfinished ...>
>BTW, it is a _REALLY_ bad idea to export the same fs via two
>cluster nodes at the same time with current Samba.
At this time, we aren't exporting the same fs via two cluster nodes
since samba in node2 is stopped, and the problem remains.
Any help will be appreciated,
Sandra Hernàndez
Volker Lendecke wrote:
> On Wed, Oct 04, 2006 at 02:15:45PM +0200, sandra-llistes wrote:
>> When we try to access from a single windows client it works fine, but
>> when we try to access to the same file from 2 or more windows clients
>> simoultaneously, windows hangs and samba also does. This seems not to
>> happen with concurrent access to different files or with linux clients.
>
> To really figure out what's going on you need to strace the
> smbd process.
>
> strace -ttT -o /tmp/smbd.out -p <smbd-pid>
>
> If you have the hang then wait some seconds, kill the
> appropriate smbd and look at /tmp/smbd.out where the smbd
> has been stuck. 99% it's in a filesystem related call, and
> then it's a GFS problem. I'm pretty sure this is GFS because
> I do not see any reason why Samba itself would behave
> differently when running on two cluster nodes.
>
> BTW, it is a _REALLY_ bad idea to export the same fs via two
> cluster nodes at the same time with current Samba. It
> _might_ be ok because you have one read only and only one
> r/w. If you had both r/w then data corruption would
> inevitably follow, we're right now working on a cluster
> version of Samba that would allow this properly.
>
> Volker
More information about the samba
mailing list