Issue with hard links, please help!

Tony Abernethy tony at servacorp.com
Mon May 15 00:47:01 GMT 2006


Recheck the statistics:
4GB file  something like 4,000,000,000 bytes
Total bytes sent:		 87,436,048  -- MUCH LESS than 4 GB
Total bytes received:	 539,014

> -----Original Message-----
> From: rsync-bounces+tony=servacorp.com at lists.samba.org
> [mailto:rsync-bounces+tony=servacorp.com at lists.samba.org]On Behalf Of
> Max Kipness
> Sent: Sunday, May 14, 2006 6:56 PM
> To: Tony Abernethy; rsync at lists.samba.org
> Subject: RE: Issue with hard links, please help!
>
>
> > > Number of files: 50285
> > > Number of files transferred: 38
> > > Total file size: 16193254538 bytes
> > > Total transferred file size: 4077908049 bytes
> > > Literal data: 86201342 bytes
> > > Matched data: 3989904700 bytes
> > > File list size: 945440
> > > File list generation time: 6.615 seconds
> > > File list transfer time: 0.000 seconds
> > > Total bytes sent: 87436048
> > > Total bytes received: 539014
> > >
> > > sent 87436048 bytes  received 539014 bytes  97913.26 bytes/sec
> > > total size is 16193254538  speedup is 184.07
> > >
> > > Well, it ends up that there is a Microsoft backup file (a .bkf file)
> > > that is around 4GB in size that is being changed daily.
> > >
> > > Now my question (I think the final one) is why the entire file seems
> to
> > > be transferred even though rsync obviously detects that only a
> fraction
> > > of the file has changed. Evidently the Literal Data shows 86201342
> of
> > > changes which appears correct. Also, since I'm using option
> > > --log-format="%f %l %b", I see on the file in question, the
> following
> > > results:
> > >
> > > SERVER/E$/exchange.bkf 4076087296 86454659
> > >
> > > Isn't this stating that the file size is 4076087296, and the changes
> to
> > > the file are 86454659?
> > >
> > > So why is the entire file transferring each day. I'm using the
> > > --no-whole-files option. Here is the rsync command options I used
> for
> > > the latest test:
> >
> > Rsync has NO guarantee that the only changes are to the END.
> > Rsync has to work when the changes are to the beginning or scattered
> > throughout.
> > Rsync goes to a lot of trouble to find and transmit only the changes.
> > This is extremely useful over slow and/or erratic network connections.
> > This is probably significantly slower over gigabit ethernet.
> >
> > Also, be aware that of the times that are representable in Unix,
> > DOS and derivatives are only capable of represententing half of them.
> > Depending on whatever, you may have DOS files that are always seen
> > as being different because the times do not and cannot match.
>
> Rsync seems to be detecting what the changes on this large file. Based
> on what you are saying, rsync in this case knows what the changes are in
> the file, roughly 86mb, but cannot transmit only the changes and
> therefore transmits the entire 4Gb file? If there is no way around this,
> I guess I'll have to live with it.
Recheck the statistics.
It did in fact transmit only the changes.
(plus some traffic to know where the changes are)
And a fair amount of work on both sides to find the pieces of the 4GB file
that are the same.
Total bytes sent: 87436048
Total bytes received: 539014
 97913.26 bytes/sec    -- This is a combination of CPU work and
transmission.
Dunno if your connection is about 1Mbps, but if it is, 4GB will take over 11
hours.
The 4GB file will appear to be transferred in slow motion
as both sides cooperate in finding which parts are the same and which parts
are different.
There is no instant knowledge of which 87MB is different and needs to be
transferred.
4GB must be read from disk on BOTH sides to determine exactly which 87MB

>
> > > rsync /share/ /backup/05-13-2006/ -v --link-dest=/backup/05-12-2006/
> > > --stats --recursive --archive --times --modify-window=1 --delete
> > > --ignore-errors --files-from=/var/www/html/backup/adlist.txt
> > > --exclude-from=/scripts/file-exclude --no-whole-file
> --log-format="%f %l
> > > %b" 2> errors.log 1> stats.log\
> > >
> > > In the previous posts I stated that du showed every incremental
> > > directory to be around 4-5gb in size. This is because each day the
> > > exchange.bkf has some change associated with it, so I guess the file
> > > cannot be linked. So in reality if you have very large files that
> have
> > > very small changes applied, hard-links really serve no purpose,
> correct?
> > > And I assume there is nothing else that can be done with these large
> > > files to conserve space?
> > Hard links are how unix names files (the file itself)
> > Hard links allow one file to have more than one name.
> > Any change to the file (by any name) is done to the file and shows up
> in
> > all
> > the other names.
> > When the last name (actually reference) is deleted, the file is
> deleted.
> > There is no "yes, but" associated with hard links.
> > Hard links will not help save space on similar but not exactly the
> same
> > files.
>
> That's what I figured, just wanted to clarify. So if you had a directory
> with 10 1GB files, and each day you made a 10k change to each, all your
> incremental directories would have 10GB total, nothing saved from
> hard-linking.
I think you misunderstand.
Without hard links you have NO access to ANY files.
It's not an additional way to access files.
It is THE way to access files that allows for more than one name.
If you have multiple directories that have identical files,
you can have one file with a name in each of the directories
and thereby save a lot of space. You make the "copies" by copying
the links which takes very little space.
ANY change, even access or permissions or times associated with the file
must either change all of them or the hard link must be broken and a fresh
copy made.
>
> Thanks again.
> Max
> --
> To unsubscribe or change options:
> https://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html



More information about the rsync mailing list