[PATCH 3/3] Add a pair of system calls to make extended file stats available [ver #2]

Trond Myklebust trond.myklebust at fys.uio.no
Tue Jun 29 19:48:56 MDT 2010


On Wed, 2010-06-30 at 02:17 +0100, David Howells wrote:
> Add a pair of system calls to make extended file stats available, including
> file creation time, inode version and data version where available through the
> underlying filesystem:
> 
> 	struct xstat_dev {
> 		unsigned int	major;
> 		unsigned int	minor;
> 	};
> 
> 	struct xstat_time {
> 		unsigned long long	tv_sec;
> 		unsigned long long	tv_nsec;
> 	};
> 
> 	struct xstat {
> 		unsigned int		struct_version;
> 	#define XSTAT_STRUCT_VERSION	0
> 		unsigned int		st_mode;
> 		unsigned int		st_nlink;
> 		unsigned int		st_uid;
> 		unsigned int		st_gid;
> 		unsigned int		st_blksize;
> 		struct xstat_dev	st_rdev;
> 		struct xstat_dev	st_dev;
> 		unsigned long long	st_ino;
> 		unsigned long long	st_size;
> 		struct xstat_time	st_atime;
> 		struct xstat_time	st_mtime;
> 		struct xstat_time	st_ctime;
> 		struct xstat_time	st_btime;
> 		unsigned long long	st_blocks;
> 		unsigned long long	st_gen;
> 		unsigned long long	st_data_version;
> 		unsigned long long	query_flags;
> 	#define XSTAT_QUERY_SIZE		0x00000001ULL
> 	#define XSTAT_QUERY_NLINK		0x00000002ULL
> 	#define XSTAT_QUERY_AMC_TIMES		0x00000004ULL
> 	#define XSTAT_QUERY_CREATION_TIME	0x00000008ULL
> 	#define XSTAT_QUERY_BLOCKS		0x00000010ULL
> 	#define XSTAT_QUERY_INODE_GENERATION	0x00000020ULL
> 	#define XSTAT_QUERY_DATA_VERSION	0x00000040ULL
> 		unsigned long long	extra_results[0];
> 	};
> 
> 	ssize_t ret = xstat(int dfd,
> 			    const char *filename,
> 			    unsigned atflag,
> 			    struct xstat *buffer,
> 			    size_t buflen);
> 
> 	ssize_t ret = fxstat(int fd,
> 			     struct xstat *buffer,
> 			     size_t buflen);
> 
> 
> The dfd, filename, atflag and fd parameters indicate the file to query.  There
> is no equivalent of lstat() as that can be emulated with xstat(), passing 0
> instead of AT_SYMLINK_NOFOLLOW as atflag.
> 
> When the system call is executed, the struct_version ID and query_flags bitmask
> are read from the buffer to work out what the user is requesting.
> 
> If the structure version specified is not supported, the system call will
> return ENOTSUPP.  The above structure is version 0.
> 
> The query_flags should be set by the caller to specify extra results that the
> caller may desire.  These come in three classes:
> 
>  (1) Size, nlinks, [amc]times and block count.
> 
>      These will be returned whether the caller asks for them or not.  The
>      corresponding bits in query_flags will be set to indicate their presence.
> 
>      If the called didn't ask for them, then they may be approximated.  For
>      example, NFS won't waste any time updating them from the server, unless
>      as a byproduct of updating something requested.
> 
> 	Query Flag			Field
> 	===============================	================
> 	XSTAT_QUERY_SIZE		st_size
> 	XSTAT_QUERY_NLINK		st_nlink
> 	XSTAT_QUERY_AMC_TIMES		st_[amc]time
> 	XSTAT_QUERY_BLOCKS		st_blocks
> 
>  (2) Creation time, Inode generation and Data version.
> 
>      These will be returned if available whether the caller asked for them or
>      not.  The corresponding bits in query_flags will be set or cleared as
>      appropriate to indicate their presence.
> 
> 	Query Flag			Field
> 	===============================	================
> 	XSTAT_QUERY_CREATION_TIME	st_btime
> 	XSTAT_QUERY_INODE_GENERATION	st_gen
> 	XSTAT_QUERY_DATA_VERSION	st_data_version
> 
>      If the called didn't ask for them, then they may be approximated.  For
>      example, NFS won't waste any time updating them from the server, unless
>      as a byproduct of updating something requested.
> 
>  (3) Extra results.
> 
>      These will only be returned if the caller asked for them by setting their
>      bits in query_flags.  They will be placed in the buffer after the xstat
>      struct in ascending query_flags bit order.  Any bit set in query_flags
>      mask will be left set if the result is available and cleared otherwise.
> 
>      The pointer into the results list will be rounded up to the nearest 8-byte
>      boundary after each result is written in.  The size of each extra result
>      is specific to the definition for that result.
> 
>      No extra results are currently defined.
> 
> If the buffer is insufficiently big, the syscall returns the amount of space it
> will need to write the complete result set, but otherwise does nothing.
> 
> If successful, the amount of data written into the buffer will be returned.
> 
> At the moment, this will only work on x86_64 as it requires system calls to be
> wired up.
> 
> 
> ===========
> FILESYSTEMS
> ===========
> 
> The following filesystems have been modified to make use of this facility:
> 
>  (*) Ext4.  This will return the creation time and inode version number for all
>      files.  It will, however, only return the data version number for
>      directories as i_version is only maintained for them.
> 
>  (*) AFS.  This will return the vnode ID uniquifier as the inode version and
>      the AFS data version number as the data version.  There is no file
>      creation time available.
> 
>  (*) NFS.  This will return the change attribute if NFSv4 only.  No other extra
>      values are returned at this time.  If mtime and ctime aren't asked for,
>      the outstanding writes won't be written to the server.  If none of
>      [amc]time, size, nlink, blocks and data_version are requested, then the
>      attributes won't be refreshed from the server.
> 
>      Probably this isn't sufficient, as the other non-optional attributes may
>      require refreshing.
> 
> 
> =======
> TESTING
> =======
> 
> The following test program can be used to test the xstat system call:
> 
> 	#define _GNU_SOURCE
> 	#define _ATFILE_SOURCE
> 	#include <stdio.h>
> 	#include <stdlib.h>
> 	#include <string.h>
> 	#include <unistd.h>
> 	#include <fcntl.h>
> 	#include <time.h>
> 	#include <sys/syscall.h>
> 	#include <sys/stat.h>
> 	#include <sys/types.h>
> 
> 	struct xstat_dev {
> 		unsigned int	major;
> 		unsigned int	minor;
> 	};
> 
> 	struct xstat_time {
> 		unsigned long long	tv_sec;
> 		unsigned long long	tv_nsec;
> 	};
> 
> 	struct xstat {
> 		unsigned int		struct_version;
> 	#define XSTAT_STRUCT_VERSION	0
> 		unsigned int		st_mode;
> 		unsigned int		st_nlink;
> 		unsigned int		st_uid;
> 		unsigned int		st_gid;
> 		unsigned int		st_blksize;
> 		struct xstat_dev	st_rdev;
> 		struct xstat_dev	st_dev;
> 		unsigned long long	st_ino;
> 		unsigned long long	st_size;
> 		struct xstat_time	st_atim;
> 		struct xstat_time	st_mtim;
> 		struct xstat_time	st_ctim;
> 		struct xstat_time	st_btim;
> 		unsigned long long	st_blocks;
> 		unsigned long long	st_gen;
> 		unsigned long long	st_data_version;
> 		unsigned long long	query_flags;
> 	#define XSTAT_QUERY_SIZE		0x00000001ULL	/* want/got st_size */
> 	#define XSTAT_QUERY_NLINK		0x00000002ULL	/* want/got st_nlink */
> 	#define XSTAT_QUERY_AMC_TIMES		0x00000004ULL	/* want/got st_[amc]time */
> 	#define XSTAT_QUERY_CREATION_TIME	0x00000008ULL	/* want/got st_btime */
> 	#define XSTAT_QUERY_BLOCKS		0x00000010ULL	/* want/got st_blocks */
> 	#define XSTAT_QUERY_INODE_GENERATION	0x00000020ULL	/* want/got st_gen */
> 	#define XSTAT_QUERY_DATA_VERSION	0x00000040ULL	/* want/got st_data_version */
> 	#define XSTAT_QUERY__ORDINARY_SET	0x00000017ULL	/* the stuff in the normal stat struct */
> 	#define XSTAT_QUERY__GET_ANYWAY		0x0000007fULL	/* what we get anyway if available */
> 	#define XSTAT_QUERY__DEFINED_SET	0x0000007fULL	/* the defined set of flags */
> 		unsigned long long	extra_results[0];
> 	};
> 
> 	#define __NR_xstat				300
> 	#define __NR_fxstat				301
> 
> 	static __attribute__((unused))
> 	ssize_t xstat(int dfd, const char *filename, int atflag,
> 			     struct xstat *buffer, size_t bufsize)
> 	{
> 		return syscall(__NR_xstat, dfd, filename, atflag, buffer, bufsize);
> 	}
> 
> 	static __attribute__((unused))
> 	ssize_t fxstat(int fd, struct xstat *buffer, size_t bufsize)
> 	{
> 		return syscall(__NR_fxstat, fd, buffer, bufsize);
> 	}
> 
> 	static void print_time(const struct xstat_time *xstm)
> 	{
> 		struct tm tm;
> 		time_t tim;
> 		char buffer[100];
> 		int len;
> 
> 		tim = xstm->tv_sec;
> 		if (!localtime_r(&tim, &tm)) {
> 			perror("localtime_r");
> 			exit(1);
> 		}
> 		len = strftime(buffer, 100, "%F %T", &tm);
> 		if (len == 0) {
> 			perror("strftime");
> 			exit(1);
> 		}
> 		fwrite(buffer, 1, len, stdout);
> 		printf(".%09llu", xstm->tv_nsec);
> 		len = strftime(buffer, 100, "%z", &tm);
> 		if (len == 0) {
> 			perror("strftime2");
> 			exit(1);
> 		}
> 		fwrite(buffer, 1, len, stdout);
> 	}
> 
> 	static void dump_xstat(struct xstat *xst)
> 	{
> 		char buffer[256], ft;
> 
> 		printf(" ");
> 		if (xst->query_flags & XSTAT_QUERY_SIZE)
> 			printf(" Size: %-15llu", xst->st_size);
> 		if (xst->query_flags & XSTAT_QUERY_BLOCKS)
> 			printf(" Blocks: %-10llu", xst->st_blocks);
> 		printf(" IO Block: %-6u ", xst->st_blksize);
> 		switch (xst->st_mode & S_IFMT) {
> 		case S_IFIFO:	printf(" FIFO\n");			ft = 'p'; break;
> 		case S_IFCHR:	printf(" character special file\n");	ft = 'c'; break;
> 		case S_IFDIR:	printf(" directory\n");			ft = 'd'; break;
> 		case S_IFBLK:	printf(" block special file\n");	ft = 'b'; break;
> 		case S_IFREG:	printf(" regular file\n");		ft = '-'; break;
> 		case S_IFLNK:	printf(" symbolic link\n");		ft = 'l'; break;
> 		case S_IFSOCK:	printf(" socket\n");			ft = 's'; break;
> 		default:
> 			printf("unknown type (%o)\n", xst->st_mode & S_IFMT);
> 			ft = '?';
> 			break;
> 		}
> 
> 		sprintf(buffer, "%02x:%02x", xst->st_dev.major, xst->st_dev.minor);
> 		printf("Device: %-15s Inode: %-11llu", buffer, xst->st_ino);
> 		if (xst->query_flags & XSTAT_QUERY_SIZE)
> 			printf(" Links: %u", xst->st_nlink);
> 		printf("\n");
> 
> 		printf("Access: (%04o/%c%c%c%c%c%c%c%c%c%c)  ",
> 		       xst->st_mode & 07777,
> 		       ft,
> 		       xst->st_mode & S_IRUSR ? 'r' : '-',
> 		       xst->st_mode & S_IWUSR ? 'w' : '-',
> 		       xst->st_mode & S_IXUSR ? 'x' : '-',
> 		       xst->st_mode & S_IRGRP ? 'r' : '-',
> 		       xst->st_mode & S_IWGRP ? 'w' : '-',
> 		       xst->st_mode & S_IXGRP ? 'x' : '-',
> 		       xst->st_mode & S_IROTH ? 'r' : '-',
> 		       xst->st_mode & S_IWOTH ? 'w' : '-',
> 		       xst->st_mode & S_IXOTH ? 'x' : '-');
> 		printf("Uid: %d   Gid: %u\n", xst->st_uid, xst->st_gid);
> 
> 		if (xst->query_flags & XSTAT_QUERY_AMC_TIMES) {
> 			printf("Access: "); print_time(&xst->st_atim); printf("\n");
> 			printf("Modify: "); print_time(&xst->st_mtim); printf("\n");
> 			printf("Change: "); print_time(&xst->st_ctim); printf("\n");
> 		}
> 		if (xst->query_flags & XSTAT_QUERY_CREATION_TIME) {
> 			printf("Create: "); print_time(&xst->st_btim); printf("\n");
> 		}
> 
> 		if (xst->query_flags & XSTAT_QUERY_INODE_GENERATION)
> 			printf("Inode version: %llxh\n", xst->st_gen);
> 		if (xst->query_flags & XSTAT_QUERY_DATA_VERSION)
> 			printf("Data version: %llxh\n", xst->st_data_version);
> 	}
> 
> 	int main(int argc, char **argv)
> 	{
> 		struct xstat xst;
> 		int ret, atflag = AT_SYMLINK_NOFOLLOW;
> 
> 		unsigned long long query =
> 			XSTAT_QUERY__ORDINARY_SET |
> 			XSTAT_QUERY_CREATION_TIME |
> 			XSTAT_QUERY_INODE_GENERATION |
> 			XSTAT_QUERY_DATA_VERSION;
> 
> 		for (argv++; *argv; argv++) {
> 			if (strcmp(*argv, "-L") == 0) {
> 				atflag = 0;
> 				continue;
> 			}
> 			if (strcmp(*argv, "-O") == 0) {
> 				query &= ~XSTAT_QUERY__ORDINARY_SET;
> 				continue;
> 			}
> 
> 			memset(&xst, 0xbf, sizeof(xst));
> 			xst.struct_version = 0;
> 			xst.query_flags = query;
> 			ret = xstat(AT_FDCWD, *argv, atflag, &xst, sizeof(xst));
> 			printf("xstat(%s) = %d\n", *argv, ret);
> 			if (ret < 0) {
> 				perror(*argv);
> 				exit(1);
> 			}
> 
> 			printf("sv=%u qf=%llx cr=%llx.%llx iv=%llx dv=%llx\n",
> 			       xst.struct_version, xst.query_flags,
> 			       xst.st_btim.tv_sec, xst.st_btim.tv_nsec,
> 			       xst.st_gen, xst.st_data_version);
> 
> 			dump_xstat(&xst);
> 		}
> 		return 0;
> 	}
> 
> Just compile and run, passing it paths to the files you want to examine:
> 
> 	[root at andromeda ~]# /tmp/xstat /afs/archive/linuxdev/fedora9/i386/repodata/
> 	xstat(/afs/archive/linuxdev/fedora9/i386/repodata/) = 152
> 	sv=0 qf=77 cr=0.0 iv=7a5 dv=5
> 	  Size: 2048            Blocks: 0          IO Block: 4096    directory
> 	Device: 00:15           Inode: 83          Links: 2
> 	Access: (0755/drwxr-xr-x)  Uid: 75338   Gid: 0
> 	Access: 2008-11-05 20:00:12.000000000+0000
> 	Modify: 2008-11-05 20:00:12.000000000+0000
> 	Change: 2008-11-05 20:00:12.000000000+0000
> 	Inode version: 7a5h
> 	Data version: 5h
> 
> 	[root at andromeda ~]# /tmp/xstat /warthog/nfs/linux-2.6-fscache
> 	xstat(/warthog/nfs/linux-2.6-fscache) = 152
> 	sv=0 qf=57 cr=0.0 iv=0 dv=f4992a4c00000000
> 	  Size: 4096            Blocks: 16         IO Block: 1048576  directory
> 	Device: 00:13           Inode: 19005487    Links: 27
> 	Access: (2775/drwxrwxr-x)  Uid: -2   Gid: 4294967294
> 	Access: 2010-06-30 02:07:42.000000000+0100
> 	Modify: 2010-06-30 02:12:20.000000000+0100
> 	Change: 2010-06-30 02:12:20.000000000+0100
> 	Data version: f4992a4c00000000h
> 
> 	[root at andromeda ~]# /tmp/xstat /var/cache/fscache/cache/
> 	xstat(/var/cache/fscache/cache/) = 152
> 	sv=0 qf=7f cr=4c24ba83.1c15ee3d iv=f585ab70 dv=2
> 	  Size: 4096            Blocks: 16         IO Block: 4096    directory
> 	Device: 08:06           Inode: 130561      Links: 3
> 	Access: (0700/drwx------)  Uid: 0   Gid: 0
> 	Access: 2010-06-29 18:16:33.680703545+0100
> 	Modify: 2010-06-29 18:16:20.132786632+0100
> 	Change: 2010-06-29 18:16:20.132786632+0100
> 	Create: 2010-06-25 15:17:39.471199293+0100
> 	Inode version: f585ab70h
> 	Data version: 2h

Yes, but could we please also add a flag that allows you to specify that
the kernel _must_ provide up to date attributes.

IOW: a flag that for something like NFS or CIFS will force a GETATTR RPC
call on the wire as opposed to using cached values.

Cheers
  Trond



More information about the samba-technical mailing list