Issue with hard links, please help!

Max Kipness max at assuredata.com
Sun May 14 23:56:10 GMT 2006


> > Number of files: 50285
> > Number of files transferred: 38
> > Total file size: 16193254538 bytes
> > Total transferred file size: 4077908049 bytes
> > Literal data: 86201342 bytes
> > Matched data: 3989904700 bytes
> > File list size: 945440
> > File list generation time: 6.615 seconds
> > File list transfer time: 0.000 seconds
> > Total bytes sent: 87436048
> > Total bytes received: 539014
> >
> > sent 87436048 bytes  received 539014 bytes  97913.26 bytes/sec
> > total size is 16193254538  speedup is 184.07
> >
> > Well, it ends up that there is a Microsoft backup file (a .bkf file)
> > that is around 4GB in size that is being changed daily.
> >
> > Now my question (I think the final one) is why the entire file seems
to
> > be transferred even though rsync obviously detects that only a
fraction
> > of the file has changed. Evidently the Literal Data shows 86201342
of
> > changes which appears correct. Also, since I'm using option
> > --log-format="%f %l %b", I see on the file in question, the
following
> > results:
> >
> > SERVER/E$/exchange.bkf 4076087296 86454659
> >
> > Isn't this stating that the file size is 4076087296, and the changes
to
> > the file are 86454659?
> >
> > So why is the entire file transferring each day. I'm using the
> > --no-whole-files option. Here is the rsync command options I used
for
> > the latest test:
> 
> Rsync has NO guarantee that the only changes are to the END.
> Rsync has to work when the changes are to the beginning or scattered
> throughout.
> Rsync goes to a lot of trouble to find and transmit only the changes.
> This is extremely useful over slow and/or erratic network connections.
> This is probably significantly slower over gigabit ethernet.
> 
> Also, be aware that of the times that are representable in Unix,
> DOS and derivatives are only capable of represententing half of them.
> Depending on whatever, you may have DOS files that are always seen
> as being different because the times do not and cannot match.

Rsync seems to be detecting what the changes on this large file. Based
on what you are saying, rsync in this case knows what the changes are in
the file, roughly 86mb, but cannot transmit only the changes and
therefore transmits the entire 4Gb file? If there is no way around this,
I guess I'll have to live with it.  

> > rsync /share/ /backup/05-13-2006/ -v --link-dest=/backup/05-12-2006/
> > --stats --recursive --archive --times --modify-window=1 --delete
> > --ignore-errors --files-from=/var/www/html/backup/adlist.txt
> > --exclude-from=/scripts/file-exclude --no-whole-file
--log-format="%f %l
> > %b" 2> errors.log 1> stats.log\
> >
> > In the previous posts I stated that du showed every incremental
> > directory to be around 4-5gb in size. This is because each day the
> > exchange.bkf has some change associated with it, so I guess the file
> > cannot be linked. So in reality if you have very large files that
have
> > very small changes applied, hard-links really serve no purpose,
correct?
> > And I assume there is nothing else that can be done with these large
> > files to conserve space?
> Hard links are how unix names files (the file itself)
> Hard links allow one file to have more than one name.
> Any change to the file (by any name) is done to the file and shows up
in
> all
> the other names.
> When the last name (actually reference) is deleted, the file is
deleted.
> There is no "yes, but" associated with hard links.
> Hard links will not help save space on similar but not exactly the
same
> files.

That's what I figured, just wanted to clarify. So if you had a directory
with 10 1GB files, and each day you made a 10k change to each, all your
incremental directories would have 10GB total, nothing saved from
hard-linking.

Thanks again.
Max


More information about the rsync mailing list