[PATCH 0/3] Extended file stat functions [ver #2]

Andreas Dilger adilger at dilger.ca
Wed Jun 30 05:04:40 MDT 2010

On 2010-06-29, at 19:16, David Howells wrote:
> Implement a pair of new system calls to provide extended and further extensible stat functions.
> The third of the associated patches provides these new system calls:
> 	struct xstat_dev {
> 		unsigned int	major;
> 		unsigned int	minor;
> 	};

Doesn't glibc use two 64-bit values for devices?

> 	struct xstat {
> 		unsigned int		struct_version;

I dislike sequential "version" fields (which are "all or nothing"), and prefer the ext2/3/4-like "feature flags" that allow the caller to state what features and fields it expects and/or understands.  This allows extensibility without unduly breaking compatibility.

> 		unsigned int		st_mode;

Having a separate MODE flag would be great for "ls --color", since that is basically the only information that it needs that isn't already available in the readdir() output.

> 		unsigned int		st_nlink;
> 		unsigned int		st_uid;
> 		unsigned int		st_gid;

In struct stat64 it uses "unsigned long" for both st_uid and st_gid.  Having a 64-bit value here is useful for CIFS servers to be able to remap different UID domains into a 32-bit domain and a 32-bit UID.  If you change this, please remember to reorder the fields for proper 64-bit alignment.

> 		unsigned int		st_blksize;
> Does st_blksize really need to be 64 bits on a 64-bit system?

I don't think so, but adding a 32-bit padding couldn't hurt.

> 		unsigned long long	st_ino;
> 		unsigned long long	st_size;
> Should the inode number and data version number fields be 128-bit?

I wouldn't object to having a 128-bit st_ino field, since this is what Lustre will be using internally in the next release.

Similarly, _filesystems_ are not SO far from hitting the 64-bit size limit (a Lustre filesystem will likely hit 100PB ~= 2^57 bytes in the next year), so having a 128-bit st_size wouldn't be unreasonable, because...

What is also very convenient that I learned Solaris stat() does is it returns the device size in st_size for a block device file.  This is very convenient, and avoids the morass of ioctls and "binary llseek guessing" used by libext2fs and libblkid to determine the size of a block device.  Any reason not to add this into this new syscall?

> 		unsigned long long	st_blocks;

If st_size is 128-bit (or has padding) then st_blocks should have the same.

> 		unsigned long long	query_flags;

It is inconsistent to have all the other fields use the "st_" prefix, but "query_flags" and "struct_version" do not have this prefix.

> 	#define XSTAT_QUERY_AMC_TIMES		0x00000004ULL

Can these be split into separate ATIME, MTIME, CTIME flags?

> 	#define XSTAT_QUERY_CREATION_TIME	0x00000008ULL

It seems a bit inconsistent to call the field "st_btime" and the mask "CREATION_TIME".  It would be more consistent (if somewhat less clear) to call the mask "BTIME".  The struct definition should get short comments for each field to explain their meaning anyway, so "st_btime" can have "/* birth/creation time */".


This is also a bit inconsistent with the "st_gen" field name.

> 	#define XSTAT_QUERY_DATA_VERSION	0x00000040ULL

It wouldn't be a bad idea to interleave these flags with each of the fields that they represent, to make it more clear what is included in each.

> 	#define XSTAT_QUERY__ORDINARY_SET	0x00000017ULL
> 	#define XSTAT_QUERY__GET_ANYWAY		0x0000007fULL

Could you provide some information what the semantic distinction between these is?  It might be useful to have an "XSTAT_QUERY_LEGACY_STAT" mask that returns only the fields that are in the previous struct stat, unless that is what "ORDINARY_SET" means, in which case it should be renamed I think.

> 	#define XSTAT_QUERY__DEFINED_SET	0x0000007fULL

It is smart to have a "DEFINED_SET" mask that maps to the currently-understood fields.  This ensures that applications compiled against a specific set of headers/struct will not request fields which they don't understand.  It might be better to call this "XSTAT_QUERY_ALL" so that it is more easily understood and used by callers, instead of the incorrect "-1" or "~0" that some may be tempted to use if they don't understand what "__DEFINED_SET" means.

Cheers, Andreas

More information about the samba-technical mailing list