rsync mechanics question

Thu May 10 17:51:57 GMT 2007

Thanks everyone for the assistance.

The -H option was the missing element. The message store uses Hard links
to save a single copy of a mass distributed email in a single location,
and then hard links the file into each recipient mailbox. 

My rsync was literally copying each hard linked reference to this file
into each mailbox, thus accounting for the inflated use of space.

Thanks everyone for the help and expertise!

-Tom

> -----Original Message-----
> From: Jamie Lokier [mailto:jamie at shareable.org]
> Sent: Wednesday, May 09, 2007 6:28 PM
> To: Tom Riley
> Cc: Matt McCutchen; rsync at lists.samba.org
> Subject: Re: rsync mechanics question
> 
> Tom Riley wrote:
> > However, the curiosity comes in with my source data taking up 86gigs
of
> > data on a 100g partition, and as the copy progresses the destination
> > drive is reporting 240 gigs of usage.
> >
> > So as far as I can tell, rsync is working and the data integrity
seems
> > good, it's simply taking up 2.5 times the space.
> 
> Do you need the -S (--sparse) option?
> 
> Omitting this, when some of the source files are sparse, is one reason
> files take more space when they are copied on unix in general.  If
> there are sparse files, this will reduce their size at the destination
> to something more reasonable, but I don't know if they'll be exactly
> the same size.
> 
> Secondly, do you need the -H (--hard-links) option?
> 
> Omitting this, when some of the source files are hard linked, would
> cause multiple copies of the same file to be created on the
destination.
> 
> To be sure of a clean copy with -S and -H, I think you need to start
> with an empty destination, the first time.  This will show you if
> those options have helped.
> 
> You can check if these options are relevant without actually copying,
> using "du" to get number of inodes and number of bytes used on the
> source disk, "find . | wc -l" to get the number of inodes
> (approximately) that will be created without -H, and "find . -printf
> '.+((%s+4095)/4096*4096)\n' | bc -l | tail -n1" (works on Linux
> anyway) to get the number of bytes (approximately) that will be
> created without -S and -H both.
> 
> > This crosses realms of expertise that I'm a bit light on, and am
fast
> > coming up to speed on. I'm trying to determine if there is some
mechanic
> > within the rsync process that could account for the used space.
James
> > mentioned that rsync creates temp files which could account for
double
> > disk usage, and I'm following up on that.
> 
> It only creates one temp file at a time, though, and moves it into
> place before starting the next one.  So if the largest individual file
> is 1G, you'd only expect 1G at most extra during the transfer, and
> nothing by the end.  It cannot possibly explain taking 2.5 times the
> space.
> 
> -- Jamie