Fwd: application control of pre-allocation

Amit K. Arora aarora at linux.vnet.ibm.com
Thu Jul 26 06:25:46 GMT 2007

On Wed, Jul 25, 2007 at 09:13:36PM -0500, Steve French wrote:
> Amit,
Hi Steve,

> What is the exact fallocate syntax?  I was having trouble deciphering
> the pieces of the raw man page source and related thread that I saw on
> lkml
The syntax of fallocate is to pass fd, mode, offset and len as
arguments. The fd is the file descriptor got from open(2). mode is
to tell whether you want the file size to be changed or not, if the
allocation was done beyond EOF. offset is the position (in bytes) in the
file from where preallocation is being requested. And, len is the length
in bytes (from offset) which has to be preallocated.

When called on a file in a particular filesystem, this should allocate
space (blocks) in the range specified. Ideally, the filesystem should
not zero out the data (for speed) and still should be able to return
zeroes when a read is done on these preallocated blocks (to avoid
leaking stale data - which can be a security threat).
In ext4 we do it by making new preallocated blocks part of "special"
extents called as "uninitialized extents". We use a single bit in the
ee_len field of the extent structure to mark an extent initialized or
uninitialized. And now if we get a read request on an uninitialized
extent, we return zeroes - without having to read physical blocks.

Similar to this each file system that wants to implement fallocate()
should have a good way of implementing this in the kernel. If not, its
better to use posix_fallocate() for the filesystem, which anyhow writes
zeroes to the new blocks that have to be allocated.

It is planned that posix_fallocate() (which is already part of glibc)
should call sys_fallocate() and if it finds that sys_fallocate is not
supported on the kernel or the file system, it should fall back to the
regular (and current) behavior of writing zeroes to new blocks.

The man page covers the syntax of fallocate in detail, and I am not sure why you
faced the problem to understand it. I will paste here some part of the
manpage for your reference. If you still think its confusing and is not
very clear to understand, please let me know. I will try to improve it.

       fallocate - manipulate file space

       #include <linux/falloc.h>

       long fallocate(int fd, int mode, loff_t offset, loff_t len);

  fallocate() allows the caller to directly manipulate the
allocated disk space for the file referred to by fd for the byte range
starting at offset and continuing for len bytes.
The mode argument determines the operation to be performed on the
given range.  Currently only one flag is supported for mode:
	allocates and initializes to zero  the  disk  space
	within  the given  range.   After  a  successful call, subsequent
	writes are guaranteed not to fail because of lack of disk space.
	Even  if the  size  of the file is less than offset+len, the file
	size is not changed.  This allows allocation of zeroed blocks
	beyond the end of file and is useful for optimizing append workloads.
	If  FALLOC_FL_KEEP_SIZE  flag  is  not  specified  in mode, the
	default behavior is almost same as when this flag is specified. The only
	difference  is  that on success, the file size will be changed if
	the offset+len is greater than the file size.  This default  behavior
	closely resembles  the behavior of the posix_fallocate(3) library
	function, and is intended as a method of optimally implementing that
	fallocate() may allocate a larger range than that was specified.

> I am trying to decide if this is a trivial mapping to cifs transact 2
> (to send over the network to e.g. a file mounted from a Windows or
> Samba server)

I do not know much about cifs and hence can not comment on this.

Please let me know how I can be of further help.

Amit Arora

> jra,
> Are you assuming that this is simply a call to
> SMB_SET_FILE_ALLOCATION_INFO2 (level 0x3fb)?  If so this may be easy
> but I was not sure if the syntax matched. Interesting that on the
> server side although there aren't any calls to this new sys call yet
> in Samba of course, it looks like Samba source/smbd/trans2.c in
> function smb_set_allocation_info could handle this efficiently on some
> OS - through overriding in vfs_allocate_file_space (source/smbd/vfs.c)
> ---------- Forwarded message ----------
> From: Jeremy Allison <jra at samba.org>
> Date: Jul 25, 2007 4:14 PM
> Subject: Re: application control of pre-allocation
> To: Steve French <smfrench at gmail.com>
> Cc: jra at samba.org
> On Wed, Jul 25, 2007 at 03:43:24PM -0500, Steve French wrote:
> >Any thoughts about whether we should extend an operation to handle
> >this new sys call or if it is already handleable?
> This is already handled as a couple of trans2 operations.
> Jeremy.
> -- 
> Thanks,
> Steve

More information about the samba-technical mailing list