rsync 3.0.9 hangs when syncing from NFSv3 share - possible to retry after timeout?

Kevin Korb kmk at sanitarium.net
Fri Sep 6 22:53:56 CEST 2013


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Is there a special reason why you don't use rsync or rsync over ssh as
the communication method instead of NFS?  You are being stuck with
- --whole-file in this configuration not to mention the expense of doing
a ton of stat() calls over the NFS.

Also, you can use lsof to see exactly what file or directory rsync has
open.

On 09/06/13 15:55, Andrew Martin wrote:
> Hello,
> 
> I'm using rsync 3.0.9 to backup several NFS shares from a
> fileserver, mounted over NFSv3, to a local RAID on a backup server.
> Both servers are running Ubuntu 12.04 server LTS. The fileserver's
> filesystem is ext4. The NFS shares are mounted on the backup server
> as follows: fileserver:/mnt/storage/share1 /mnt/share1 type nfs
> (ro,tcp,bg,soft,intr,addr=192.168.1.1) 
> fileserver:/mnt/storage/share2 /mnt/share2 type nfs
> (ro,tcp,bg,soft,intr,addr=192.168.1.1) 
> fileserver:/mnt/storage/share3 /mnt/share3 type nfs
> (ro,tcp,bg,soft,intr,addr=192.168.1.1)
> 
> These shares contain a large amount of files, including SVN
> checkouts, extracted kernel trees, etc. I've run into a problem
> where rsync will appear to hang or block indefinitely when backing
> up one particular share, share3, but occasionally it will happen
> with one of the other shares instead. A cron starts backing up
> share3 nightly at 20:15. When this blocking problem does not occur,
> the backup typically finishes around 20:45. However, when this
> problem occurs, rsync blocks indefinitely. I have configured rsync
> to run using the "timeout" command so that it will be killed if not
> finished by 9:00 the next day: timeout -k 30s 764m rsync -av
> --modify-window=2 --link-dest=/mnt/backups/share3/2013-09-04
> --exclude .svn/ /mnt/share3/ /mnt/backups/share3/2013-09-05 The
> exit code is 137, which I believe is 128 (from rsync) plus 9 sent
> by timeout.
> 
> Here are the child rsync processes, as you can see 1915 is in
> uninterruptable sleep, but I believe that is normal: root      1914
> 0.0  0.0  10148   492 ?        S    Sep05   0:00 timeout -k 30s
> 764m rsync -av --modify-window=2
> --link-dest=/mnt/backups/share3/2013-09-04 --exclude .svn/
> /mnt/share3/ /mnt/backups/share3/2013-09-05 root      1915  0.0
> 0.3  81240 27784 ?        D    Sep05   0:20 rsync -av
> --modify-window=2 --link-dest=/mnt/backups/share3/2013-09-04
> --exclude .svn/ /mnt/share3/ /mnt/backups/share3/2013-09-05 root
> 1916  0.0  0.2 120028 19032 ?        S    Sep05   0:22 rsync -av
> --modify-window=2 --link-dest=/mnt/backups/share3/2013-09-04
> --exclude .svn/ /mnt/share3/ /mnt/backups/share3/2013-09-05 root
> 1917  0.0  0.3 138272 26612 ?        S    Sep05   0:07 rsync -av
> --modify-window=2 --link-dest=/mnt/backups/share3/2013-09-04
> --exclude .svn/ /mnt/share3/ /mnt/backups/share3/2013-09-05
> 
> Running strace on the processes shows that the processes are not
> actively doing anything: # strace -p 1914 Process 1914 attached -
> interrupt to quit wait4(1915,
> 
> # strace -p 1915 Process 1915 attached - interrupt to quit
> 
> # strace -p 1916 Process 1916 attached - interrupt to quit 
> select(4, [3], [], NULL, {10, 731653}^C <unfinished ...> Process
> 1916 detached
> 
> # strace -p 1917 Process 1917 attached - interrupt to quit 
> select(1, [0], [], NULL, {27, 691627}^C <unfinished ...> Process
> 1917 detached
> 
> Based on the output in my rsync log file, I can see the last
> directory that it copied a file from. I ran "time find
> /path/to/that/dir -type f" on that directory and some other
> directories on share3 and all of them returned quickly; I was not
> able to make "find" block. The rsync crons that run for share1 and
> share2 typically complete successfully, and they are also mounted
> over NFS with the same mount options from the same fileserver.
> 
> I do not see anything obviously related in dmesg on either the the
> backup server or fileserver. Does anyone have an idea on what is
> causing rsync to hang, or if there is a way to have it retry or
> skip a file if there is a problem rather than blocking forever? The
> --timeout option seems like it will abort the entire sync, but I
> would like just skip over the bad section and continue with the
> rest of the backup. Is this possible?
> 
> Thanks,
> 
> Andrew
> 

- -- 
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~
	Kevin Korb			Phone:    (407) 252-6853
	Systems Administrator		Internet:
	FutureQuest, Inc.		Kevin at FutureQuest.net  (work)
	Orlando, Florida		kmk at sanitarium.net (personal)
	Web page:			http://www.sanitarium.net/
	PGP public key available on web site.
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.20 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlIqQOQACgkQVKC1jlbQAQcjwQCg1OhS8NciSJXolj6uND88O7R+
mLwAn0OPMGRfI/OrXjaNNBnz4RSUvS2U
=6/1y
-----END PGP SIGNATURE-----


More information about the rsync mailing list