rsync takes a long time to start doing any transfers

Cary Lewis cary.lewis at gmail.com
Thu Jul 26 16:26:37 MDT 2012


Thanks so much for the info. It does appears as though rsync scans the
entire subdir before doing anything, which seems pretty inefficient,
perhaps this will be improved in a future release. Although, maybe it has
to be this way, so that the --delete commands can work?

On Thu, Jul 26, 2012 at 4:42 PM, Lars Ellenberg
<lars.ellenberg at linbit.com>wrote:

> On Thu, Jul 19, 2012 at 01:51:43PM -0400, Cary Lewis wrote:
> > I want to use rsync with a cloud based rsync provider to do off-site
> > backing up of a large (1TB) dataset which consists of 32 million+ files
> > spread out in 300 directories. So the amount of files in any one
> directory
> > can be quite large (upwards of 2 million).
>
> You realize that stat() is a costly operation,
> especially if the inodes are cache cold, even more so if something else
> stresses the IO and VM subsystems on the box.
>
> On a moderately loaded box, recursively stating 3 million files
> occasionally took 90 minutes and more.  Doing the same once the inodes
> are cache-hot takes the same box under the same overall stress 30 to 90
> *seconds*.
>
> Holding 3 Millon dentries and inodes cache-hot requires (on that box,
> anyways) ~ 5 Gigabyte of slab memory (of 128 G available...).
>
> So if you want to regularly recursively stat (and that's what rsync
> needs to do) 32 millon files, you better add more ram, much more ram,
> to your box.
>
> Also, you mention Cygwin.
> IIRC, by default, that will still treat file names as case*in*sensitive,
> so you get really bad (maybe O N^2?) behaviour
> when walking large directories.
>
> There was some setting which I do not remember right now,
> to tell rsync and/or cygwin to treat this as casesensitive,
> which can seriously improve behaviour with large directories.
>
> > Rsync doesn't seem to cope with this well - even doing local copies in a
> > directory with several thousands of files takes a long time to initiate
> any
> > transferring.
>
> I'm speculating here.
> But I thought the file list generation is still per sub-directory, so
> would need to scan the current subdir fully before starting to work on
> the resulting partial file list.
>
> > I though that with version 3, rsync was supposed to start transferring
> > before fully testing all of the files in a directory?
> >
> > I am using version 3.0.9 under Cygwin.
> >
> > Is there a command line switch I am supposed to use to force rsync to
> start
> > transferring more quickly?
> >
> > Any insight / suggestions would be most appreciated.
>
>
> --
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
> --
> Please use reply-all for most replies to avoid omitting the mailing list.
> To unsubscribe or change options:
> https://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.samba.org/pipermail/rsync/attachments/20120726/684ec907/attachment.html>


More information about the rsync mailing list