Copying EAs and ACLs

jw schultz jw at pegasys.ws
Sun Mar 2 23:50:23 EST 2003


Access Control Lists (ACLs) and Extended Attributes (EA) are
an area i have seen for some time as something rsync will
need to address.  I've put a tighter focus on this issue for
the past week or so and have reached a few conclusions.

1. ACL and EA OS support is growing but not really there yet.
	
	Most of the UNIX players have POSIX ACL support.
	I have no data on UNIX EA support but as far as i
	can tell it is mostly absent.

	Linux does not have consistent support yet.  ACL
	support is not part of mainstream production
	kernels.  If you want it you have to either apply
	patches or run a development kernel.  Some
	distribution kernels are patched with ACL support
	for one or more filesystems.

	I know that XFS supports ACLs and EAs, later
	versions of EXT2, EXT3 and JFS do also.  I'm not
	entirely sure of the status of rieserfs.

2. Utility support is almost completely missing.
	
	Not only does rsync not support ACLs and EAs yet but
	neither does cpio, tar (with the exception of star)
	nor most of the backup utilities.

3. Use lags support dramatically.

	The vast majority of sites do not use either ACLs
	nor EAs even when they can.  The lack of utility
	support aggravates this.

	Until support is ubiquitous across production-grade
	OSs, filesystems and utilities the adoption of ACLs
	and EAs will be delayed.

4. ACLs and EAs are a part of the future.

	Users and admins are coming to linux and Unix with
	the expectation of ACLs.  While intelligent use of
	group IDs can more simply deal with _almost_ all
	permissions issues, and by being simpler tend to be
	more secure, many will prefer the quick fix ACLs
	provide.

	Some of the new security models i've seen are going
	to require both ACLs and EAs.

	The potential value of EAs in GUI environments
	should not be underestimated.  Imagine file-manager
	thumbnails and application icons that really are
	attached to the file.

	Even rsync may find good use for EAs.  I can
	envision optional storing of blocksums as an
	extended attribute.  A 64KB EA could support the
	blocksums for a 127MB file using 4byte sums and 16KB
	blocks.

	Who knows what else the future might hold.  Once
	people expect EA support.
	
	Those applicable utilities that fail to support ACLs
	and EAs will become irrelevant.

So while demand is currently low i believe that rsync will
need to support ACLs and EAs in the near future or it will
become little more than a limited download tool like ftp.
When that will be is an unknown but i think the release of
the Linux 2.6 kernel will be a major factor.  This means
that the widespread use of ACLs on the filesystems may well
begin in as little as one year if 2.6 comes out on schedule.

Where does that leave us?  What am i going to do about it?

Based on the level of comments on this and other threads
there is little demand presently for ACL or EA support in
the rsync user or developer base.  There are a few people
for whom this is a production issue today and a few like
myself who see it as a future issue that should be
anticipated.

I do not use ACLs or EAs currently.  I believe that support
for them should be added sooner rather than later and i
think that how that support is implemented is very
important.  I however do not use them at this time and
unless someone will pay me to do so i'm not going to set up
a lab or start slinging code to add that support.  Sorry but
this just isn't my itch yet.  While i care how it is done i
don't care enough yet to do it myself and build the test
apparatus. 

The remainder of this missive shall be a bit more technical.
Since i do care how they are implemented and have some
informed ideas in that direction it seems good to me to
relate those ideas.   If you like design documents think of
this as a start on a high-level one.  I don't generally care
for design documents but the design should be discussed
before such significant code is generated.  If you aren't
interested in such discussion, or don't care what i have to
say please move on instead of complaining.  I've broken this
down into several sections.

If you have comments to make please address one issue per
followup so they form separate sub-threads and only quote the
relevant text.  Unless there are comments i expect this will
be the last i'll discuss this issue for a while.

	-- How have OSs implemented Access Control Lists --

In UNIX ACLs are implemented in various, sometimes
non-standard, ways.  For the most part it looks like they
are largely compatible with POSIX ACLs.

In Linux ACLs are implemented as an EA.  In order to
support ACLs on Linux and non-linux platforms we have
to treat them separately from the EAs.

Although NTFS supports ACLs the cygwin environment does not
_yet_ reflect that.*** NTFS ACLs do not quite translate to
and from POSIX. The semantic differences mean that
information is lost translating each direction.

Netware also has a form of ACLs similar to NTFS.

The order of POSIX ACLs does not affect them.  The only
consistent compact expression of them is textual in the form:

	[d:]type:id:perms
	type is one of u,g,o,m
	id is either the name or ascii id number.
	perms is a the symbolic rwx string as shown in ls

As near as i can tell the acl_t structure used by the
libraries is an opaque data type and working on it directly
is likely to break on some platforms and may be subject to
change in future.


	-- How have OSs implemented Extended Attributes --

EAs seem to be unsupported by most of the UNIX platforms.
Pipe up now if you know of support for them on any
mainstream UNIX.

Limited EAs are supported on the MacOS and OS-X in the form
of the resource fork.  I have a vague recollection that NTFS
also has some form of EAs.

In Linux EAs have been growing in importance.  In addition
to ACLs other security features are being implemented with
them.

Extended attributes are simple name, value pairs.  Names are
partitioned into namespaces ("user." and "system." at
present) are null terminated fully qualified text up to 256
bytes long.  Values are BLOBS up to 64KB long.  The count
and total size of a file's EAs is implementation dependant
(varies from 1KB to unlimited).  The order on retrieval of
them is indeterminate.


	-- What should rsync support --

Extended Attributes and POSIX ACLs.

Like it or not rsync is POSIX semantics oriented.  Further,
when running on the one major non-POSIX platform it does so
in a POSIX emulating environment (cygwin) so it doesn't even
have access to non-POSIX file semantics.  Full
cross-platform functionality isn't really rsync's bailiwick.
Perhaps a future tool that supercedes rsync might
interoperate with native, non-POSIX semantics but i don't
see rsync going there.

If all we supported were Linux we could simply provide
Extended attributes and the ACLs would be automatically
included.  However, we should support the other POSIX
(Unix mostly) platforms. 

It would be worthwhile to have some limited support for
non-POSIX ACLs in some cases.  I'll discuss that in the
"how" section.


	-- How should rsync support Access Control Lists --

As i've already said the focus should be on POSIX ACLs.

As much as i am inclined to deal with ACLs as binary data
	short	type/flags
	short	perms	
	int	id
or something similar, the library routines treat the acl_t
structure as an opaque object and don't really support that.
So we would wind up converting the acl_t to the text form
and then to the binary and visa-versa.  I don't think the
compaction (10-15 chars/ACE -> 8 or 12 bytes/ACE) is worth
the extra computation.  If the acl_t structure turns out to
be less opaque i'd be more than happy to revisit the binary
format.

Given the potential size of an ACL we don't want to transmit
the ACL for every file that has an extended one.  An
extended ACL is one with entries not in the standard POSIX
UGO permissions mask.  What we should do instead is assemble
each ACL into a consistent, sorted form that lends itself to
checksum comparison.  A single block checksum for a file's
ACL would allow us to identify those files who's ACLs had
changed.  This way we would only transmit an ACL if it had
changed.

Limited support for non-POSIX ACLs could be provided.
During the protocol discovery phase** ACL capabilities should
be determined.  If two like systems (windows<->windows for
example) are communicating they could use their native ACL
format instead of POSIX.  Otherwise non-POSIX systems would
be expected to convert their ACL format to and from our
chosen POSIX representation.  


	-- How should rsync support Extended Attributes --

Extended attributes can be quite large.  The EAs of a file
should be built into a single contiguous object.  The object
would be built with the extended attributes sorted by name.
This EA object would be compared and transmitted just like a
regular file using the rsync algorithm.  Possible
alternatives would be to generate a separate checksums for
each EA or to somehow start a new blocksum with each EA to
take advantage of the fact that each EA is a discrete entity
requiring a separate syscall to retrieve, remove, set or
change.

Where ACLs are being supported by EAs the ACL EA(s) would
not be included in the EA synchronization.  For this reason
it will be necessary to write the ACLs after the EAs.  So
the order of ops on a file update would be file-data, EA,
ACL, chown, chgrp, chmod, and finally mtime.


		-- Unresolved issues --

A drawback to checksum approach to EA and ACL objects is that
it would often be necessary to create the objects twice on
the receiver, once to generate the blocksums and a second
time to merge any changes.

Also at issue is whether and what timestamps are modified by
changes to ACLs and EAs.  Such updates should change ctime
but not necessarily mtime.  If that is the case it may be
necessary to generate total checksum and size on the sender
during the file-scan phase or things could get quite ugly.
Such a early sender checksum would mean the objects might
have to be generated twice on the sender as well.

The regeneration may be partially avoidable by allowing a
limited amount to be attached to the flist or otherwise
cached.  The ACLs in particular may be small enough to keep.
Because ACLs and EA lists will often be the same on many
files a hash to the ACLs and EAs objects could be maintained
and duplicates could be identified reducing inter-phase
storage requirements.

			--------

** Some of my comments may conflict with the rsync
protocol.  This is because i have not so far needed to get
into the protocol itself and i have yet to see any decent
documentation on the protocol.  Tridge's algorithm is well
enough documented but the protocol implementing it is not.

*** I don't use windows so my cygwin and NTFS knowledge is
limited.


-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw at pegasys.ws

		Remember Cernan and Schmitt


More information about the rsync mailing list