Rsync takes long time to finish

Brian K. White brian at aljex.com
Thu Apr 12 16:16:46 MDT 2012


You can try to switch to faster filesystems (reiserfs/ext4/btrfs/zfs) 
and enable metadata performance options and do other tuning steps 
(dir_index, noatime) and upgrade disks and ram etc, but mostly, with a 
frankly unrealistic business requirement like that, you have to either 
tell business that the requirement can't be promised, only strived for, 
or, develop your own system outside of rsync to detect the changes and 
then rsync those files specifically.

For instance, install incrond and make an incron job that watches those 
directories and fires off an rsync just for that file every time a file 
changes. You will still want to run a full regular rsync periodically 
from cron because incron is event based, not a spooler. Events can be 
missed for any number of reasons once in a while (incron is turned off 
because server is in the process of starting up or shutting down or 
upgrading software, your script failed for some events, incrond crashed 
or was killed, etc...) so you need a regular cron job that periodically 
does a full normal rsync to catch anything that might have been missed.

The end result is, barring missed events, all files are synced 
immediately when they are changed, not every 10 minutes.
That may not be good for you though. It depends what the application 
does. If the application is updating hundreds of files constantly, this 
won't work at all.

You may want to investigate distributed filesystems instead of rsync jobs.

-- 
bkw

On 4/12/2012 4:28 PM, vijay patel wrote:
> Thanks friends. We are using Redhat Linux 5.8 on Production and Disaster
> Recovery side. By drilling down we have found out it is taking lot of
> time to check what has changed while data tranfer is very fast. As i
> mentioned data in these folders is very less (hardly 40GB) and whenever
> new file is created, it is of max 30KB.
>
> Since we have to sync production environment to DR every 10 mins as per
> Business requirement i have to schedule it via cron. This already
> distributed folder structure i am using. I already have another rsync
> job which runs every 5 mins on another folder structure. It is running
> fine. Is there any option i can use with rsync to make this folder check
> fast?
>
> Regards,
> Vijay
>
>
>
>  > From: Matthew.Stier at us.fujitsu.com
>  > To: kmk at sanitarium.net; rsync at lists.samba.org
>  > Subject: RE: Rsync takes long time to finish
>  > Date: Thu, 12 Apr 2012 19:29:03 +0000
>  >
>  > The first clause should read "does not parallelize".
>  >
>  >
>  > -----Original Message-----
>  > From: rsync-bounces at lists.samba.org
> [mailto:rsync-bounces at lists.samba.org] On Behalf Of Stier, Matthew
>  > Sent: Thursday, April 12, 2012 3:07 PM
>  > To: Kevin Korb; rsync at lists.samba.org
>  > Subject: RE: Rsync takes long time to finish
>  >
>  > And, although rsync does parallelize, nothing stops you from running
> multiple instances of rsync.
>  >
>  > I had to transfer files from system A to system B, and being limited
> by the processing power of a single thread of rsync, I drilled down one
> level, and ran rsync's against each the first level file and
> subdirectory. This put more threads/cores/processors to work made better
> use of the network bandwidth to get the job done.
>  >
>  > When all the rsync's finished, I ran a single root level rsync to
> catch the stragglers.
>  >
>  > If you have the processing power, use it.
>  >
>  >
>  > -----Original Message-----
>  > From: rsync-bounces at lists.samba.org
> [mailto:rsync-bounces at lists.samba.org] On Behalf Of Kevin Korb
>  > Sent: Thursday, April 12, 2012 2:44 PM
>  > To: rsync at lists.samba.org
>  > Subject: Re: Rsync takes long time to finish
>  >
>  > -----BEGIN PGP SIGNED MESSAGE-----
>  > Hash: SHA1
>  >
>  > Several suggestions...
>  >
>  > Add a lockfile to your cron job so it doesn't run two instances at the
>  > same time and you don't have to predict the run time.
>  >
>  > Make sure you are running rsync version 3+ on both systems. It has
>  > significant performance benefits over version 2.
>  >
>  > Run a job manually and add --itemize-changes and --progress. Try to
>  > figure out where most of the time is spent. Looking for something to
>  > transfer, transferring new files, or updating changed files.
>  >
>  > If it is mostly looking for something to transfer then you need
>  > filesystem optimizations. Such as directory indexing. You didn't
>  > specify the OS or anything but if you are on Linux this is where an
>  > ext3 > ext4 conversion would be helpful.
>  >
>  > If it is mostly transferring new files then look at the network
>  > transfer rate. If it is low then try optimizing the ssh portion. Try
>  > using -e 'ssh -c arcfour' or try using the hpn version of openssh. If
>  > encryption isn't important you could also setup rsyncd.
>  >
>  > If it is mostly updating existing files check the itemize output to
>  > see if the files really need updating. For instance if something is
>  > screwing with your timestamps that will create a bunch of extra work
>  > for rsync. Also, --inplace might help performance but be sure to read
>  > about it.
>  >
>  > On 04/12/12 14:29, vijay patel wrote:
>  > > Hi Friends,
>  > >
>  > > I am using rsync to copy data from Production File Server to
>  > > Disaster Recovery file server. I have 100Mbps link setup between
>  > > these two servers. Folder structure is very deep. It is having path
>  > > like /reports/folder1/date/folder2/file.tx, where we have 1600
>  > > directories like 'folder1', daily folders since last year in date
>  > > folder and 2 folders for each date folder like folder2 which
>  > > ultimately will contain the file. Files are not too big but just
>  > > design of folder structure is complex. Folder structure design is
>  > > done by application and we can't change it at the moment. I am
>  > > using following command in cron to run rsync.
>  > >
>  > > rsync -avh --delete --exclude-from 'ex_file.txt' /reports/
>  > > 10.10.10.100:/reports/ | tee /tmp/rsync_report.out >>
>  > > /tmp/rsync_report.out.$today
>  > >
>  > > Initially we were running it every 5 mins then we increased it to
>  > > every 30 mins since one instance was not getting finished in 5
>  > > mins. Now we have made it to run every 8 hours because of lots of
>  > > folders. Is there a way i can improve performance of my rsync??
>  > >
>  > >
>  > > Regards, Vijay
>  > >
>  > >
>  > >
>  >
>  > - --
>  >
> ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~
>  > Kevin Korb Phone: (407) 252-6853
>  > Systems Administrator Internet:
>  > FutureQuest, Inc. Kevin at FutureQuest.net (work)
>  > Orlando, Florida kmk at sanitarium.net (personal)
>  > Web page: http://www.sanitarium.net/
>  > PGP public key available on web site.
>  >
> ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~
>  > -----BEGIN PGP SIGNATURE-----
>  > Version: GnuPG v2.0.17 (GNU/Linux)
>  > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>  >
>  > iEYEARECAAYFAk+HIoMACgkQVKC1jlbQAQddkACeOljjKSj/NVpc4dj6+Hjm946j
>  > 9IsAoPNV4DrbTtH5Yj8Zk7p/2O8JacE3
>  > =LsDJ
>  > -----END PGP SIGNATURE-----
>  > --
>  > Please use reply-all for most replies to avoid omitting the mailing list.
>  > To unsubscribe or change options:
> https://lists.samba.org/mailman/listinfo/rsync
>  > Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
>  > --
>  > Please use reply-all for most replies to avoid omitting the mailing list.
>  > To unsubscribe or change options:
> https://lists.samba.org/mailman/listinfo/rsync
>  > Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
>  > --
>  > Please use reply-all for most replies to avoid omitting the mailing list.
>  > To unsubscribe or change options:
> https://lists.samba.org/mailman/listinfo/rsync
>  > Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
>
>



More information about the rsync mailing list