Re(2): rsync error: received SIGUSR1 or SIGINT (code 20) at rsync.c(229)

Dave Dykstra dwd at bell-labs.com
Thu Feb 14 06:51:17 EST 2002


On Tue, Feb 12, 2002 at 07:00:10PM +0100, R?nnblom Jan?ke /Teknous wrote:
> dwd at bell-labs.com skriver:
> >Ah, I see you posted more details for the same problem.  That message comes
> >when one process on the receiver side interrupts the other because it is is
> >about to die, and doesn't really say anything about what the real problem
> >might be.  I assume it's dying before 84600 seconds, right?  Have you ever
> 
> Yes, it dies after the same time all the times. A crashrun takes about
> 20 minutes.
> >
> >tried it without -v?  That's been known to cause problems before, although
> >I'm not sure if they've been fixed.  Be sure to check to see if the log
> 
> Trying without -v now and no difference.
> >
> >file on the server side has anything useful.  Have you ever tried pulling
> >the files from a server rather than pushing to it?  The rsync server mode
> >was never really intended for significant writing, more for reading, and it
> >has some quirks when used for writing (not to say that dying completely is
> >just a "quirk").
> >
> >Again, it's possible you're running out of memory with that many files.
> >
> Checked the unixbox and the process is using about 50M (according to top)
> before crashing and the machine has 256MB and does nothing besides being
> subject to my rsync testing.
> 
> However I did another test with 2.5.3pre and this is what i got on the
> client:
> 
> rsync error: received SIGUSR1 or SIGINT (code 20) at rsync.c(229)
> 
> and on the server:
> 
> select(4, [3], NULL, NULL, {10800, 0})  = 1 (in [3], left {10800, 0})
> read(3, "TEM.DAT", 7)                   = 7
> time(NULL)                              = 1013540288
> select(4, [3], NULL, NULL, {10800, 0})  = 1 (in [3], left {10800, 0})
> read(3, "\377\377\377\377", 4)          = 4
> time(NULL)                              = 1013540288
> select(4, [3], NULL, NULL, {10800, 0})  = 1 (in [3], left {10800, 0})
> read(3, "\374\2004:>\34\5P", 8)         = 8
> time(NULL)                              = 1013540288
> select(4, [3], NULL, NULL, {10800, 0})  = 1 (in [3], left {10800, 0})
> read(3, ".BAK", 4)                      = 4
> time(NULL)                              = 1013540288
> select(4, [3], NULL, NULL, {10800, 0})  = 1 (in [3], left {10800, 0})
> read(3, "m", 1)                         = 1
> time(NULL)                              = 1013540288
> select(4, [3], NULL, NULL, {10800, 0})  = 1 (in [3], left {10800, 0})
> read(3, "\3", 1)                        = 1
> time(NULL)                              = 1013540288
> select(4, [3], NULL, NULL, {10800, 0})  = 1 (in [3], left {10800, 0})
> read(3, "\0\0\262\313", 4)              = 4
> time(NULL)                              = 1013540288
> select(4, NULL, [3], NULL, {10800, 0})  = 1 (out [3], left {10800, 0})
> write(3, "R\0\0\10overflow: flags=0x6d l1=3 l2"..., 86) = 86
> time(NULL)                              = 1013540288
> select(4, NULL, [3], NULL, {10800, 0})  = 1 (out [3], left {10800, 0})
> write(3, "-\0\0\10ERROR: buffer overflow in re"..., 49) = 49
> time(NULL)                              = 1013540288
> rt_sigaction(SIGUSR1, {SIG_IGN}, {0x8050ed4, [], SA_RESTART|0x4000000}, 8)
> = 0
> rt_sigaction(SIGUSR2, {SIG_IGN}, {0x8050ef4, [], SA_RESTART|0x4000000}, 8)
> = 0
> select(4, NULL, [3], NULL, {10800, 0})  = 1 (out [3], left {10800, 0})
> write(3, "K\0\0\10rsync error: error allocatin"..., 79) = 79
> time(NULL)                              = 1013540288
> munmap(0x2aabf000, 4096)                = 0
> _exit(22)                               = ?
> 
> I have a error above about "error allocatin" memory and "TEM.DAT" was in
> the old trace also (2.4.6) but top and ps afux said it used 50MB...
> 
> Nothing in the other logfiles.
> 
> How can I trace this error further?


Ah, that strace was quite helpful and showed a real error message which is
somehow getting lost.  The overflow message comes from receive_file_entry()
in flist.c and appears to be because the filename that's coming next is too
long, greater than around MAXPATHLEN.  Do you know what file to expect after
"TEM.DAT"?  The message was that l1 was 3, which you can see it reading in
the trace, and that means that l2 is ((0262 << 8) + 0313) which is 45771,
clearly not the length of a filename!  Something is getting mismatched in
the protocol.  I'm not sure where the "error allocatin" message is coming
from, it can come from many places.

- Dave




More information about the rsync mailing list