Issue with hard links, please help!

Tony Abernethy tony at servacorp.com
Sun May 14 23:37:03 GMT 2006


Max Kipness wrote:
>
> > You could of course (right after an rsync run) do a
> > "cd newdir; find . -type f -links 1 -print" and then randomly check a
> > couple and compare all their attributes such as mtime, permissions to
> > the previous dir. (I still recommend using the --link-dest thing over
> > using cp -al first.)
>
> Ok, I think I've figured out the problem with this one, although I'm not
> exactly sure of the reason. I have now started using --link-dest and
> this works great. Here again is the stat screen:
>
> Number of files: 50285
> Number of files transferred: 38
> Total file size: 16193254538 bytes
> Total transferred file size: 4077908049 bytes
> Literal data: 86201342 bytes
> Matched data: 3989904700 bytes
> File list size: 945440
> File list generation time: 6.615 seconds
> File list transfer time: 0.000 seconds
> Total bytes sent: 87436048
> Total bytes received: 539014
>
> sent 87436048 bytes  received 539014 bytes  97913.26 bytes/sec
> total size is 16193254538  speedup is 184.07
>
> Well, it ends up that there is a Microsoft backup file (a .bkf file)
> that is around 4GB in size that is being changed daily.
>
> Now my question (I think the final one) is why the entire file seems to
> be transferred even though rsync obviously detects that only a fraction
> of the file has changed. Evidently the Literal Data shows 86201342 of
> changes which appears correct. Also, since I'm using option
> --log-format="%f %l %b", I see on the file in question, the following
> results:
>
> SERVER/E$/exchange.bkf 4076087296 86454659
>
> Isn't this stating that the file size is 4076087296, and the changes to
> the file are 86454659?
>
> So why is the entire file transferring each day. I'm using the
> --no-whole-files option. Here is the rsync command options I used for
> the latest test:

Rsync has NO guarantee that the only changes are to the END.
Rsync has to work when the changes are to the beginning or scattered
throughout.
Rsync goes to a lot of trouble to find and transmit only the changes.
This is extremely useful over slow and/or erratic network connections.
This is probably significantly slower over gigabit ethernet.

Also, be aware that of the times that are representable in Unix,
DOS and derivatives are only capable of represententing half of them.
Depending on whatever, you may have DOS files that are always seen
as being different because the times do not and cannot match.

>
> rsync /share/ /backup/05-13-2006/ -v --link-dest=/backup/05-12-2006/
> --stats --recursive --archive --times --modify-window=1 --delete
> --ignore-errors --files-from=/var/www/html/backup/adlist.txt
> --exclude-from=/scripts/file-exclude --no-whole-file --log-format="%f %l
> %b" 2> errors.log 1> stats.log\
>
> In the previous posts I stated that du showed every incremental
> directory to be around 4-5gb in size. This is because each day the
> exchange.bkf has some change associated with it, so I guess the file
> cannot be linked. So in reality if you have very large files that have
> very small changes applied, hard-links really serve no purpose, correct?
> And I assume there is nothing else that can be done with these large
> files to conserve space?
Hard links are how unix names files (the file itself)
Hard links allow one file to have more than one name.
Any change to the file (by any name) is done to the file and shows up in all
the other names.
When the last name (actually reference) is deleted, the file is deleted.
There is no "yes, but" associated with hard links.
Hard links will not help save space on similar but not exactly the same
files.

>
> Thanks
> Max



More information about the rsync mailing list