Fwd: application control of pre-allocation
Amit K. Arora
aarora at linux.vnet.ibm.com
Fri Jul 27 12:14:32 GMT 2007
On Thu, Jul 26, 2007 at 09:58:07AM -0500, Steve French wrote:
> Now that I see the man page, we will need to see if we can allocate a
> range in the middle of a file - I am fairly certain that we can extend
> the allocation size of a file with the existing network file system
> (CIFS) syntax but I thought that the syntax was similar to set file
> size (ie just the new end of file offset, not a starting offset and
> length). Is the most common usage of this call expected to be with
> offset 0 and length beyond end of file
This call can be used to allocate space beyond the EOF and also in the
middle of the file (in the case of sparse files). Since each application
can have its own requirements, the semantics gives the user flexibility
on specifying the range, rather than just the "file size".
Regarding the most common usage, *I* think it will be from current EOF to
a few more MBs/GBs. Like RDMS servers and video streaming applications -
these, I think, will like to preallocate towards the end of the file for
guarantee of space in advance and for minimum possible fragmentation
(thus, better access speed).
But, there may be some applications which may just want to be able to
preallocate from anywhere in the file.
Actually speaking, this is not a new idea, as you may be aware. There is
already a posix_fallocate() library call available in glibc which tries
to do the same thing from user space. sys_fallocate() is just a more
efficient way of implementing this feature - but, from inside the
kernel. And there are plans that posix_fallocate() would start using
sys_fallocate() internally and will fall back to "current mechanism" of
preallocating from user space (by writing zeroes to new blocks) if the
fallocate system call failed.
> On 7/26/07, Amit K. Arora <aarora at linux.vnet.ibm.com> wrote:
> >On Wed, Jul 25, 2007 at 09:13:36PM -0500, Steve French wrote:
> >> Amit,
> >Hi Steve,
> >> What is the exact fallocate syntax? I was having trouble deciphering
> >> the pieces of the raw man page source and related thread that I saw on
> >> lkml
> >The syntax of fallocate is to pass fd, mode, offset and len as
> >arguments. The fd is the file descriptor got from open(2). mode is
> >to tell whether you want the file size to be changed or not, if the
> >allocation was done beyond EOF. offset is the position (in bytes) in the
> >file from where preallocation is being requested. And, len is the length
> >in bytes (from offset) which has to be preallocated.
> >When called on a file in a particular filesystem, this should allocate
> >space (blocks) in the range specified. Ideally, the filesystem should
> >not zero out the data (for speed) and still should be able to return
> >zeroes when a read is done on these preallocated blocks (to avoid
> >leaking stale data - which can be a security threat).
> >In ext4 we do it by making new preallocated blocks part of "special"
> >extents called as "uninitialized extents". We use a single bit in the
> >ee_len field of the extent structure to mark an extent initialized or
> >uninitialized. And now if we get a read request on an uninitialized
> >extent, we return zeroes - without having to read physical blocks.
> >Similar to this each file system that wants to implement fallocate()
> >should have a good way of implementing this in the kernel. If not, its
> >better to use posix_fallocate() for the filesystem, which anyhow writes
> >zeroes to the new blocks that have to be allocated.
> >> I am trying to decide if this is a trivial mapping to cifs transact 2
> >> (to send over the network to e.g. a file mounted from a Windows or
> >> Samba server)
More information about the samba-technical