cost of tdb transactions on ext4 with barriers

Fri Feb 12 06:37:51 MST 2010

Olaf Frączyk wrote:
> Please look here, if you didn't do it already:
>
> http://www.mjmwired.net/kernel/Documentation/block/barrier.txt
>
> Best regards,
>
> Olaf
> On Fri, 2010-02-12 at 20:17 +1100, ronnie sahlberg wrote:
>   
>> FS guys have a lot to learn from SCSI.
>>
>> In SCSI you have a flag, FUA, in the rpc header for each individual
>> i/o (WRITE10) that says
>> * you may just write this to cache if you think this benefits performance
>> or
>> * this i/o goes to the medium. period.   and if not, we will make you
>> suffer, publicly, next time we t.10 people meet up.
>>
>> Kind of what you might expect that fsync() would provide on non-broken systems.
>>
>>
>>
>> regards
>> ronnie sahlberg
>>
>>
>> On Fri, Feb 12, 2010 at 8:07 PM, Olaf Fraczyk <olaf at navi.pl> wrote:
>>     
>>> Hello,
>>>
>>> In XFS it depends if you have wirte cache enabled.
>>> BTW. the XFS journals only metadata.
>>>
>>> So to be safe you need:
>>> 1. Write cache off - the barriers are not needed
>>> or
>>> 2. Write cache on, and barriers on
>>>
>>> If you have writeback cache enabled and barriers off, then on power
>>> failure you can get inconsistent filesystem.
>>>
>>> Most people claim, that disabling both barriers and write cache is
>>> better in terms of performance than cache and barriers enabled.
>>>
>>> I personally always turn the wirteback cache off on my drives.
>>>
>>> The above was for XFS but I see no point that should make it for ext4
>>> any different.
>>>
>>> It is safe to use writeback cache and have barriers disabled only if you
>>> can assure that the write cache (both on controller and in hdd drive)
>>> will survive crash or power outage.
>>>
>>> Best regards,
>>>
>>> Olaf
>>> On Fri, 2010-02-12 at 13:57 +1100, tridge at samba.org wrote:
>>>       
>>>> Hi Ronnie,
>>>>
>>>>  > Do you really need barriers on ext4?
>>>>
>>>> It depends if you want your data to survive a machine crash.
>>>>
>>>> For my development box, I'm happy to risk losing a small amount of
>>>> data on a machine crash. For a production box it's not such a good
>>>> thing.
>>>>
>>>> I'm not aware of anything unique in ext4 that allows it to avoid data
>>>> corruption on system crash with barriers off. Maybe Rusty knows of
>>>> something?
>>>>
>>>> Cheers, Tridge
>>>>
>>>>         
>>> --
>>> Olaf Frączyk <olaf at navi.pl>
>>> NAVI
>>> http://www.navi.pl
>>> http://www.ntp.navi.pl
>>>
>>>
>>>       
>
>   
Amusing: the writer either didn't know the history of Unix, or is making
a tongue-in-cheek reference
to rename() and creat() in the second sentence of the article:

Tejun Heo <htejun[AT]gmail[DOT]com>, July 22 2005 writes:

I/O barrier requests are used to guarantee ordering around the barrier requests.  
Unless you're crazy enough to use disk drives for implementing synchronization 
constructs (wow, sounds interesting...), the ordering is meaningful only for 
write requests for things like journal checkpoints

Sigh, another "impedance mismatch" to deal with, only instead of MS-vs-Unix, 
it's competing-Unix-implementation vs Unix-implementation.  

In this case, the filesystem will make ordering guarantees, but only
about entire sequences of operations, so all of creat, mv and any
file-based synchronizations operations require all data in the 
cache *before* the synch to be written to disk before the write
that implements the synchronization construct. This queue of data
can be large, and take significant time to write.

You may recognize the same class of issue here as the Linux kernel folks
suffer with locks, which take a relatively long time to get to memory
or through the cache-consistency hardware, and so murder performance.
Thus we see lots of discussion of lock-free algorithms, read-copy-
update (RCU) and transactional memory to address this related problem.

I'd be tempted to drag Mr. Heo into the discussion, if only
to see how deep his tongue is in his cheek...

--dave

[One of the things Thompson and Ritchie deliberately did *not* copy 
from Multics to Unix were the locking constructs. 
 Instead, they implemented mutual exclusion with creat(... O_EXCL|O_CREAT) 
and atomic change with link/unlink (mv,  rename). Both used the filesystem 
as the means of implementing synchronization. 
 Synchronization is a favorite  wheel to reinvent: RCU is just the most
recent attempt to reduce its squareness.
 For an interesting discussion on how to do synchronization badly, see 
David Tilbrook's classic article on the mv/rename/link problem at
http://www.qef.com/html/docs/rename.pdf. 

]

-- 
David Collier-Brown,         | Always do right. This will gratify
System Programmer and Author | some people and astonish the rest
davecb at spamcop.net           |                      -- Mark Twain
(416) 223-8968