[clug] Kernel without initramfs

Fri Mar 27 23:07:50 GMT 2009

Robert Edwards <bob at cs.anu.edu.au> writes:
> Daniel Pittman wrote:
> ...
>>
>> (See the article for details of the operations, but the dataset is a
>>  regular Linux kernel git repository and source tree.)
>>
>> Amount of data written (in megabytes) on an ext4 filesystem
>> Operation	with journal	w/o journal	percent change
>> git clone	367.7           353.0           4.00%
>> make            231.1           203.4           12.0%
>> make clean	14.6            7.7             47.3%
>>
>> Amount of data written (in megabytes) on an ext4 filesystem
>>     mounted with noatime
>> Operation	with journal	w/o journal	percent change
>> git clone	367.0	        353.0	        3.81%
>> make	        207.6	        199.4	        3.95%
>> make clean	6.45	        3.73	        42.17%
>>
>>
>> Metadata heavy workloads — delete a lot of stuff, specifically, which
>> *really* sucks on ext3 in terms of I/O writes — might cost close to
>> twice as much, but normal workloads are vastly better.
>
> I have been thinking a bit about these numbers and have come to the
> conclusion that they don't really tell the whole picture in terms of
> number of FLASH "blocks" being erased and re-written. My thinking goes
> like: the actual writing of large chunks of data will generally be
> whole-block writes, whereas the updating of the journal may be small
> chucks of data, but still needs to be done per f/s change. So, although
> the overall number of bytes being written to the journal may be small,
> the number of FLASH blocks being erased and rewritten could still be
> quite large.

That is more or less true, in terms of the problem statement.

> But that is just my thinking - next is to come up with a way of
> measuring what is actually happening at the FLASH chip layer (or, at
> least, at the block layer).

That varies, as the other thread says, a lot based on your SSD.
Notably, the Intel SSDs use a block map, and combine multiple small
writes into a single large write with little regard for the LBA of the
individual blocks.

Other devices don't, and suffer the sort of issue you note, since they
only replace blocks on some simpler internal strategy, so every 4K write
can result in a 128K read-modify-erase-write cycle.

OTOH, that problem is going to be more or less identical on any file
system, on average, since the journal writes are usually reasonably
contiguous block writes...

So, yeah, not that easy, but it certainly gives some solid figures about
how much your write load amplifies with a journal vs without.

Regards,
        Daniel

At least, in so far as filesystems written by Ted go. :)