Fwd: application control of pre-allocation
Amit K. Arora
aarora at linux.vnet.ibm.com
Thu Jul 26 06:25:46 GMT 2007
On Wed, Jul 25, 2007 at 09:13:36PM -0500, Steve French wrote:
> What is the exact fallocate syntax? I was having trouble deciphering
> the pieces of the raw man page source and related thread that I saw on
The syntax of fallocate is to pass fd, mode, offset and len as
arguments. The fd is the file descriptor got from open(2). mode is
to tell whether you want the file size to be changed or not, if the
allocation was done beyond EOF. offset is the position (in bytes) in the
file from where preallocation is being requested. And, len is the length
in bytes (from offset) which has to be preallocated.
When called on a file in a particular filesystem, this should allocate
space (blocks) in the range specified. Ideally, the filesystem should
not zero out the data (for speed) and still should be able to return
zeroes when a read is done on these preallocated blocks (to avoid
leaking stale data - which can be a security threat).
In ext4 we do it by making new preallocated blocks part of "special"
extents called as "uninitialized extents". We use a single bit in the
ee_len field of the extent structure to mark an extent initialized or
uninitialized. And now if we get a read request on an uninitialized
extent, we return zeroes - without having to read physical blocks.
Similar to this each file system that wants to implement fallocate()
should have a good way of implementing this in the kernel. If not, its
better to use posix_fallocate() for the filesystem, which anyhow writes
zeroes to the new blocks that have to be allocated.
It is planned that posix_fallocate() (which is already part of glibc)
should call sys_fallocate() and if it finds that sys_fallocate is not
supported on the kernel or the file system, it should fall back to the
regular (and current) behavior of writing zeroes to new blocks.
The man page covers the syntax of fallocate in detail, and I am not sure why you
faced the problem to understand it. I will paste here some part of the
manpage for your reference. If you still think its confusing and is not
very clear to understand, please let me know. I will try to improve it.
fallocate - manipulate file space
long fallocate(int fd, int mode, loff_t offset, loff_t len);
fallocate() allows the caller to directly manipulate the
allocated disk space for the file referred to by fd for the byte range
starting at offset and continuing for len bytes.
The mode argument determines the operation to be performed on the
given range. Currently only one flag is supported for mode:
allocates and initializes to zero the disk space
within the given range. After a successful call, subsequent
writes are guaranteed not to fail because of lack of disk space.
Even if the size of the file is less than offset+len, the file
size is not changed. This allows allocation of zeroed blocks
beyond the end of file and is useful for optimizing append workloads.
If FALLOC_FL_KEEP_SIZE flag is not specified in mode, the
default behavior is almost same as when this flag is specified. The only
difference is that on success, the file size will be changed if
the offset+len is greater than the file size. This default behavior
closely resembles the behavior of the posix_fallocate(3) library
function, and is intended as a method of optimally implementing that
fallocate() may allocate a larger range than that was specified.
> I am trying to decide if this is a trivial mapping to cifs transact 2
> (to send over the network to e.g. a file mounted from a Windows or
> Samba server)
I do not know much about cifs and hence can not comment on this.
Please let me know how I can be of further help.
> Are you assuming that this is simply a call to
> SMB_SET_FILE_ALLOCATION_INFO2 (level 0x3fb)? If so this may be easy
> but I was not sure if the syntax matched. Interesting that on the
> server side although there aren't any calls to this new sys call yet
> in Samba of course, it looks like Samba source/smbd/trans2.c in
> function smb_set_allocation_info could handle this efficiently on some
> OS - through overriding in vfs_allocate_file_space (source/smbd/vfs.c)
> ---------- Forwarded message ----------
> From: Jeremy Allison <jra at samba.org>
> Date: Jul 25, 2007 4:14 PM
> Subject: Re: application control of pre-allocation
> To: Steve French <smfrench at gmail.com>
> Cc: jra at samba.org
> On Wed, Jul 25, 2007 at 03:43:24PM -0500, Steve French wrote:
> >Any thoughts about whether we should extend an operation to handle
> >this new sys call or if it is already handleable?
> This is already handled as a couple of trans2 operations.
More information about the samba-technical