preserving Mac OS X metadata in rsync backups and restores

Moritz Heckscher moritz.heckscher at gmx.de
Sat Jan 19 00:37:15 GMT 2008


Am 2008-01-18 um 20:56 schrieb David Miller:

> On Jan 18, 2008, at 8:14 AM, Moritz Heckscher wrote:
>
>> Hello all,
>>
>> I'm new to the list, but have done quite a bit of researching  
>> before regarding the support of Mac OS X specific features  
>> (resource forks, extended attributes, ACLs, file creation &  
>> modification date).
>>
>> By reading the archives, I get the impression that the current  
>> version of rsync 3.0.0pre8 is quite far in this respect. At least  
>> it sounds so, and I thank the developers very much for this! I  
>> like your approach much more than the (very buggy) one originally  
>> pursued by Apple (store metadata in separate ._ file).
>>
>
> Be careful and test, test, test. I tried using pre8 to sync two  
> local Xserve RAID's(about 2TB of data) and I'm seeing these errors.
>
> rsync: writefd_unbuffered failed to write 4 bytes [sender]: Broken  
> pipe (32)
> [receiver] internal abbrev error!
> rsync error: error in rsync protocol data stream (code 12) at  
> xattrs.c(565) [receiver=3.0.0pre8]
> rsync: connection unexpectedly closed (175959 bytes received so  
> far) [sender]
> rsync error: error in rsync protocol data stream (code 12) at io.c 
> (600) [sender=3.0.0pre8]
>
> I have another Xserve RAID(about 1.3TB) and I don't get those  
> errors when syncing with pre8. I'm trying to pin down what files/ 
> folders are causing the problem now.

Thanks for your feedback. I will certainly test a lot before going  
into deployment. (Currently I'm waiting for a hard disk so my machine  
is not even physically built yet.) I have found some interesting  
resources regarding the metadata problem which you might find useful  
for isolating the problem with your machines also:

1) A detailed post about which kinds of metadata exist on Mac OS X  
and about how poorly almost all programs handle them:

<http://blog.plasticsfuture.org/2006/03/05/the-state-of-backup-and- 
cloning-tools-under-mac-os-x/>

The post is rather old (March 2006, i.e. OS X 10.4.5/10.4.6), however  
from my research it seems that basically nothing has changed since  
then (a real shame, I say!). (I am fairly certain nothing substatial  
has changed in 10.4. I am also not sure if things have been fixed in  
10.5 in the cp and rsync programs of OS X or if Apple has invested  
all their energy to get things right in their own "Time Machine"  
program, at the cost of neglecting the standard tools.)

2) Almost a year later, someone has built a test set containing files  
will (almost?) all possible Max OS X metadata:

<http://inik.net/node/150>

One could transfer this small number of files from the set back and  
forth and check where the problems are.

3) To help with the checking/comparing, someone else built a little  
tool. It not only creates a collection of test files but can also  
compare the original and transferred versions afterwards:

<http://www.n8gray.org/blog/2007/04/27/introducing-backup-bouncer/>

I haven't tested these yet, but will once my hardware is ready.

All in all, the metadata situation on Mac OS X is and has been a  
total mess. It's almost impossible to generate true backups (true  
meaning keeping all metadata intact with the data). Sure, usually you  
don't need all metadata and people are happy if they can at least  
recover the data fork. But on the other hand there is all this  
metadata, used by all sorts of old and new programs, so keeping it  
should be possible.

>> I plan to do the following:
>>
>> * Run a Linux server (Ubuntu, I guess, on a ext3 partition) with  
>> two separate internal ATA hard disks formatted in XFS and  
>> configured in software RAID to store the actual backup data. (As I  
>> understand, I should use XFS rather than ext3 because XFS supports  
>> extended attributes large enough to hold also larger converted Mac  
>> resource forks.)
>>
>> * Back up from different Mac OS X clients (cuurently all on 10.4,  
>> but I might upgrade them to 10.5 later) to the server using rsync  
>> over ssh. This should hopefully preserve (most of) the Mac- 
>> specifif metadata on the server. (Actually I plan to use  
>> rsnapshot, but I believe if I have rsync installed in the newest  
>> version and possibly tell rsnaphot to use the appropriate rsync  
>> options, things will be the same.)
>>
>> Now my question is the following:
>>
>> 1) What would I have to do to ensure the metadata is also restored  
>> correctly? I assume I will have to use rsync for restoring also,  
>> and if I just copy over data (using, e.g, scp or over an AFP or  
>> CIFS or NFS network mount), I will lose this metadata. Is this  
>> correct?
>>
>
> Why not use rsync3 for both backup and restore. Either use ssh  
> (rsync -azXA --delete /path/to/source server:/path/to/target) or  
> setup an rsync daemon server. This way you let rsync handle the  
> metadata.

Yes, that's what I meant, using rsync in both directions should (!)  
keep things intact. I was asking about the other alternatives because  
while I will set up the server, it might be necessary for users to  
restore their own backups. And I don't think they'd be able to  
successfully use rsync on the command line.

So I was thinking of maybe publishing the backup directory on the  
server as a read-only share on the network that users could mount on  
their client machines. If they then restore files (over AFP,  
SMB, ...), will this destroy the metadata that rsync had previously  
stored? I assume so. (Yes, I know, I can test myself, and will when  
possible, I'm just hoping to get advice from seasoned rsync/network/ 
Mac gurus here beforehand...)

>> Another problem I'm thinking about is that rsnapshot should be run  
>> on the server to "pull" the backups over the network. One cannot  
>> run it on the clients and "push" the data to the server over the  
>> network -- which is what I'd prefer because I plan to not leave  
>> the server on all day but rather have it woken up by the (laptop)  
>> clients when needed who'll take care of the scheduling of the  
>> backups (using anachron or launchd etc.) One could, however, run  
>> rsnapshot on the clients to backup onto a locally attached storage  
>> device.
>>
>
> You don't need rsnapshot. Use the --link-dist option to create  
> incremental backups.

Thanks for pointing this out. I had looked through the man page of  
rsync a few times, but, you know, it's a little complicated in the  
beginning to understand all the options...

Anyway, I think rsnapshot would still be a good (better) solution for  
me because it handles all the rotating of daily/weekly/monthly  
backups. I browsed through the Perl code a few days ago and saw that  
it was more than 5000 lines long. I think if several smart people  
have worked on this problem for years and produced a script heavily  
tested, this should be more reliable than rolling out my own 20 lines  
shell script.

My clients are laptop machines which are not always on, so I expect a  
lot of interrupted or skipped backups. That's difficult to deal with.

>> This leads me to the second question:
>>
>> 2) If I mount the server as a network drive on the clients using  
>> AFP, SMB/CIFS, NFS, ..., and then backup to this 'locally  
>> attached' drive with rsync (using rsnapshot), will I lose the  
>> metadata because of the transfer via the SMB/... layer?

See above. If anyone can confirm I will lose the metadata, I'd be  
grateful.

-Moritz

>> Thanks a lot for a great program!
>> -Moritz
>>
>> -- 
>> To unsubscribe or change options: https://lists.samba.org/mailman/ 
>> listinfo/rsync
>> Before posting, read: http://www.catb.org/~esr/faqs/smart- 
>> questions.html
>
>
> David Miller.
> -- 
> To unsubscribe or change options: https://lists.samba.org/mailman/ 
> listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart- 
> questions.html



More information about the rsync mailing list