Rsync dies

tim.conway at philips.com tim.conway at philips.com
Tue May 21 11:49:02 EST 2002


Sure.  Here's what I sent to Lennie.  I'm ccing the list in case it's 
raised more questions.

Here's the splitter program.  Yes, it's a recursive shell script. However, 
the only active processes in the process tree will be the tips of the 
tree, and the bigger processes (sort, wc) will exit before children are 
called., so it won't suck up all your resources.
It takes a filename and a number of items to limit the named subdirs to 
containing.
The file is generated by find, with relative paths, naming the directory.
for instance, to split /www
cd /
find www -print >/tmp/listfile
splitter /tmp/listfile >modulelist
++++++++++++++++++++++++++++++++++++++++++++++
#!/bin/sh

limit=$1
file=$2

splitdir(){

dir=$1

pathlength=`echo $dir |tr / ' '|wc -w`
pathlength=`echo $pathlength`

searchpat="^$dir/"
[ "$searchpat" = "^/" ] && searchpat='^'

grep $searchpat $file |
cut -d/ -f1-`expr $pathlength + 1` |
uniq -c |
while read dircount subdir
        do

                if [ "$dircount" -le "$limit" ]
                        then
                                echo $subdir
                        else
                                (splitdir $subdir) </dev/null
                fi

        done

}

splitdir
++++++++++++++++++++++++++++++++++++++++++++++
In my application, I actually run the find directly on the NAS box, and 
process the file with a much faster SunOS box.
Here's the script run on the NAS box:
++++++++++++++++++++++++++++++++++++++++++++++
#!/bin/sh

#200105something tcc wrote to get list of everything we push out on 
toolservers. 
#the output is used as input to 
#ToolSyncMakeModules, which gives back directories directories with no 
more than some limit (50000 seems to
#work well) subcomponents (files AND directories, combined) for efficient 
rsync.  Some of these "modules"
#may in fact, be files.  rsync -a gets them.  Important note!:  Must rsync 
module to parent directory
#destination, not module/* to module name.  If module is a symlink, you 
copy everything in under link name,
#which is probably not what you want, and with the ensuredir function, the 
link will be created as a dir,
#causing the synced files to actually exist seperately in linked locations

#As a low-cost safety, have added lines to fix ownerships.  Done locally 
on toolserver, low performance cost.

basedir=/mnt/vol1
workdir=$basedir/ToolSyncModules

chgrp=/usr/bin/chgrp
chmod=/bin/chmod
chown=/usr/sbin/chown
date=/bin/date
echo=/bin/echo
find=/usr/bin/find
mv=/bin/mv
rm=/bin/rm


listfile=$workdir/Module.List
stagelistfile=$workdir/Module.List.stage
logfile=$workdir/makefull.log

if cd $basedir 2>/dev/null
then

{

#for time purposes
$date

#$find big big1 -print > $stagelistfile
$find big/*/* big1/*/* -print > $stagelistfile
rm $listfile 2>/dev/null
mv $stagelistfile $listfile

chmod ugo+r $listfile

#just fix the ownerships, in case.  These are the numeric ids of user and 
group Tools
$chown -R 24 big big1 &
$chgrp -R 70 big big1 &

#for time purposes
$date
} >$logfile 2>&1

else

$echo "Can't cd to $basedir to create ToolSync raw file list"

fi
++++++++++++++++++++++++++++++++++++++++++++++
Here's the script run by the SunOS box
++++++++++++++++++++++++++++++++++++++++++++++
#!/bin/sh

#define directories used
basedir=/wan/sjt-tools-master1/sjt-tools-master1
listdir=$basedir/ToolSyncModules
moduledir=$listdir/ToolSyncModules
workdir=/tmp
tee=/bin/tee

#define scripts used - non-standard locations
splitter=$listdir/ToolSyncSplitModules

#define files used
listfile=$listdir/Module.List
logfile=$listdir/split.log
worklistfile=$workdir/ToolSyncModule.list
modulefile=$moduledir/ToolSyncModuleList
stagemodulefile=$listdir/ToolSyncModuleList
umask 022

{
#for timetrial purposes
date

#put it on a fast filesystem for reading
cp $listfile $worklistfile

#and write it to a slow filesystem (very little writing... read 1.5Mlines, 
write 500)
$splitter 500000 $worklistfile > $stagemodulefile

#flash over so the file is either there or not, never part.
#All the data is there, and we just change the name.
rm $modulefile 2>/dev/null
mv $stagemodulefile $modulefile

#clean up after ourselves
rm $worklistfile

#for timetrial purposes
date

} > $logfile 2>&1
++++++++++++++++++++++++++++++++++++++++++++++
I have actually quit using rsync for the full syncronization, and have 
written a pair of scripts to use find, sort, gzip, diff, and tar.  The 
basic idea took about 10 minutes to write, and worked, but I then 
optimized it to take advantage of our specific conditions. and to add 
integrety/safety measures (prevent catastrophic deletions).
I'd run rsync for 4 days, beat the hell out of the network, and have it 
die incomplete.  Now, I can finish it in 3 hours (for a nop... more time 
to copy over changes).  You might want to consider  a script-based 
solution if you're short on cpu/ram, or are working over NFS.

Good luck.

Tim Conway
tim.conway at philips.com
303.682.4917
Philips Semiconductor - Longmont TC
1880 Industrial Circle, Suite D
Longmont, CO 80501
Available via SameTime Connect within Philips, n9hmg on AIM
perl -e 'print pack(nnnnnnnnnnnn, 
19061,29556,8289,28271,29800,25970,8304,25970,27680,26721,25451,25970), 
".\n" '
"There are some who call me.... Tim?"




Lenny Foner <foner at media.mit.edu>
05/17/2002 02:10 PM

 
        To:     Tim Conway/LMT/SC/PHILIPS at AMEC
        cc:     foner at media.mit.edu
        Subject:        Rsync dies
        Classification: 



    Date: Fri, 17 May 2002 11:43:55 -0600
    From: tim.conway at philips.com

    I have some code that can be used to analyze your system before the 
sync, 
    and choose directories containing no more than a maximum number of 
items 
    below them.  Iterating through the list and using -R can let you get 
the 
    whole thing run, though --delete and -H become less certain (not 
    dangerous, but if you don't name anything containing a deleted 
directory 
    because it didn't come up on your list, youll never tell the 
destination 
    to delete it, and if you have two hard links to the same file, but hit 

    them in two seperate runs, you now have two copies on disk).
    Let me know if you want it.  I'm sure you can figure how to modify it 
for 
    your environment.

I'd be interested in this.  Tnx.




Tim Conway
tim.conway at philips.com
303.682.4917
Philips Semiconductor - Longmont TC
1880 Industrial Circle, Suite D
Longmont, CO 80501
Available via SameTime Connect within Philips, n9hmg on AIM
perl -e 'print pack(nnnnnnnnnnnn, 
19061,29556,8289,28271,29800,25970,8304,25970,27680,26721,25451,25970), 
".\n" '
"There are some who call me.... Tim?"




"Jurrie Overgoor" <jurr at tref.nl>
05/21/2002 12:30 PM
Please respond to "Jurrie Overgoor"

 
        To:     Tim Conway/LMT/SC/PHILIPS at AMEC
        cc: 
        Subject:        Re: Rsync dies
        Classification: 



Well, I'm not the one that started the tread, but I am interessed in your
code non the less. Could you please mail it to me? Thanks in advance,

        Greetz -- Jurrie
        jurr at tref.nl

----- Original Message -----
From: <tim.conway at philips.com>
To: "C.Zimmermann" <clemens at prz.tu-berlin.de>
Cc: <rsync at lists.samba.org>; <rsync-admin at lists.samba.org>
Sent: Friday, May 17, 2002 7:43 PM
Subject: Re: Rsync dies


Yeah.  You'll have to find a way to break the job up into smaller pieces.
It's a pain, but I have a similar situation - 3M+ files in 130+Gb.  I
can't get the whole thing in one chunk, no matter how fast a server with
however much memory, even on Gb ethernet (for the server).  In my case,
the filesystem is on NAS, and the NAS has only 100bT simplex (half-duplex,
to some).
I have some code that can be used to analyze your system before the sync,
and choose directories containing no more than a maximum number of items
below them.  Iterating through the list and using -R can let you get the
whole thing run, though --delete and -H become less certain (not
dangerous, but if you don't name anything containing a deleted directory
because it didn't come up on your list, youll never tell the destination
to delete it, and if you have two hard links to the same file, but hit
them in two seperate runs, you now have two copies on disk).
Let me know if you want it.  I'm sure you can figure how to modify it for
your environment.

Tim Conway
tim.conway at philips.com
303.682.4917
Philips Semiconductor - Longmont TC
1880 Industrial Circle, Suite D
Longmont, CO 80501
Available via SameTime Connect within Philips, n9hmg on AIM
perl -e 'print pack(nnnnnnnnnnnn,
19061,29556,8289,28271,29800,25970,8304,25970,27680,26721,25451,25970),
".\n" '
"There are some who call me.... Tim?"




"C.Zimmermann" <clemens at prz.tu-berlin.de>
Sent by: rsync-admin at lists.samba.org
05/17/2002 02:08 AM


        To:     <rsync at lists.samba.org>
        cc:     (bcc: Tim Conway/LMT/SC/PHILIPS)
        Subject:        Rsync dies
        Classification:



I´m trying to rsync a 210 GB Filesystem with approx 1.500.000 Files.

Rsync always dies after about 29 GB without any error messages.
I´m Using rsync  version 2.5.5  protocol version 26.

Has anyone an idea ?

Thank´s Clemens



--
To unsubscribe or change options:
http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html




--
To unsubscribe or change options:
http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html








More information about the rsync mailing list