rsync using ssh get stuck

Mark, Oren oren.mark at intel.com
Tue Jan 31 07:51:26 GMT 2006


Hi,

 

We started working with rsync to sync data between remote sites. 

We started seeing many stuck rsync processes.

Usually it happens in the ssh stage while the ssh issues a "select"
syscall on fd #4 while it is long gone.

 

Here is an example:

 

root at ptsl2171:/root# ps -efwww |grep ekrimer

ekrimer  28619  4979  0 Jan28 ?        00:00:00
/var/netstar//lib/build_0134_19/nbjobleader.out
/arch/projects/gesher/gesher_high
/a/nfs/iil/proj/mpgarch/arch_vpool_1/ekrimer/ambig/task_nhm /netbatch
/a/nfs/iil/proj/mpgarch/arch_vpool_1/ekrimer/ambig/task_nhm/##post_exec_
1.vpool_idc.7496709 /netbatch/##post_exec_1.vpool_idc.7496709
/a/nfs/iil/proj/mpgarch/arch_vpool_1/ekrimer/ambig/task_nhm/##post_exec_
1.vpool_idc.7496709 /netbatch/##post_exec_1.vpool_idc.7496709 0 batch
ptsl2171 BATCH 1138446228 1138472243 1.vpool_idc.7496709 19  ,cputime
soft = unlimited,cputime hard = unlimited,filesize soft =
unlimited,filesize hard = unlimited,datasize soft = unlimited,datasize
hard = unlimited,stacksize soft = 8192,stacksize hard =
unlimited,coredumpsize soft = 0,coredumpsize hard = unlimited,openfiles
soft = 1024,openfiles hard = 8192,descriptors soft = 1024,descriptors
hard = 8192,addressspace soft = unlimited,addressspace hard =
unlimited,memorylocked soft = unlimited,memorylocked hard =
unlimited,maxproc soft = 16384,maxproc hard = 16384,memoryuse soft =
unlimited,memoryuse hard = unlimited null false false false 5 0
/nfs/site/proj/mpgarch/perf/tools/scripts/bin/arch_post.csh
/netbatch/ekrimer/task_nhm_296/runs
/nfs/site/proj/mpgarch/arch_vpool_1/ekrimer/ambig/results

ekrimer  28620 28619  0 Jan28 ?        00:00:00 /bin/csh -f
/nfs/site/proj/mpgarch/perf/tools/scripts/bin/arch_post.csh
/netbatch/ekrimer/task_nhm_296/runs
/nfs/site/proj/mpgarch/arch_vpool_1/ekrimer/ambig/results

ekrimer  28641 28620  0 Jan28 ?        00:00:00 /usr/intel/bin/rsync -e
ssh -azx --rsync-path=/usr/intel/bin/rsync
/netbatch/ekrimer/task_nhm_296/runs
rsync-mpgarch.iil.intel.com:/nfs/site/proj/mpgarch/arch_vpool_1/ekrimer/
ambig/results

ekrimer  28642 28641  0 Jan28 ?        00:00:00 ssh
rsync-mpgarch.iil.intel.com /usr/intel/bin/rsync --server -logDtprxz .
/nfs/site/proj/mpgarch/arch_vpool_1/ekrimer/ambig/results

root      7647  7606  0 20:22 pts/0    00:00:00 grep ekrimer

root at ptsl2171:/root# strace -p 28620

root at ptsl2171:/root# strace -p 28641

select(5, NULL, [4], NULL, {48, 20000} <unfinished ...>

root at ptsl2171:/root# ls -l /proc/28641/fd/5

lrwx------    1 ekrimer  arch           64 Jan 30 20:23 /proc/28641/fd/5
-> socket:[92942621]

root at ptsl2171:/root# strace -p 28642

select(4, [], [3], NULL, NULL <unfinished ...>

root at ptsl2171:/root# ls -l /proc/28642/fd/4

ls: /proc/28642/fd/4: No such file or directory

root at ptsl2171:/root# ls -l /proc/*/fd/* | grep 'socket:\[92942621\]'

ls: /proc/8035/fd/255: No such file or directory

ls: /proc/8035/fd/3: No such file or directory

ls: /proc/self/fd/255: No such file or directory

ls: /proc/self/fd/3: No such file or directory

lrwx------    1 ekrimer  arch           64 Jan 30 20:25 /proc/28641/fd/5
-> socket:[92942621]

 

Does anybody have an idea the reason for that?

 

Thanks,

Oren Mark

Intel - Israel Engineering Computing

Unix Server Platforms

oren.mark at intel.com <mailto:oren.mark at intel.com> 

(+) 972-4-865-5987

iNET: 465-5987

 

-------------- next part --------------
HTML attachment scrubbed and removed


More information about the rsync mailing list