[ccache] Buffer size for IO operations is too small
Anders Björklund
anders at itension.se
Sun Apr 3 20:59:29 UTC 2016
Anders Björklund wrote:
> Michael Kolomeytsev wrote:
>> I've discovered that there is too small buffer size for IO in ccache: 16k
>> or 10k
>> (in hash_fd, copy_fd, copy_file).
>>
>
> But your observations are very interesting, and please post
> more if you have it. Would also be nice to have some follow-up
> on the observation about ccache problems with multiple cores:
> https://github.com/jrosdahl/ccache/issues/54 (also on OS X)
>
> I'm thinking that hash and copy could do with different macros...
Actually three macros, hash, compress/decompress and plain old copy.
Thought I'd move the "copy" case aside, away from the other buffers...
You'd think that copying a file would be a simple thing to do, right ?
Actually, on some systems like Windows or Mac OS X it is. But on Linux:
Found this interesting blog post, that came with some benchmarks too:
http://blog.plenz.com/2014-04/so-you-want-to-write-to-a-file-real-fast.html
So the first thing to do would be to make the I/O buffer size into a
whole multiple of the block size, that is: 16384 instead of 10240.
Avoids having to do partial page copies later. And then allocating the
buffer in kernel space instead of user space sounded like a good idea.
But having to look for various OS/kernel versions of sendfile()? Eww.
Might as well stick with "splice()", since other main systems like
have solutions already: Win32 have CopyFile and OS X has copyfile.
And doing some "advise/allocate" sounded easy, but had pitfalls too.
Here is the end result, in case anyone is interested in a preview:
https://github.com/jrosdahl/ccache/compare/master...itensionanders:uncompressed
It sounded like a good idea, but needs some actual benchmarks to see
whether it was actually worth it. Probably should check st_blksize too.
The actual I/O can probably be made twice as fast (e.g. for a 1M file)
Question is whether it makes any real impact of the ccache run time ?
pipe+splice + advices + trunc 1175ns 1283ns 1290ns
read+write 4bs 1537ns 2126ns 2210ns (+ 30.8%)
read+write 10k 2334ns 2356ns 2668ns (+ 98.6%)
read+write bs 2515ns 2692ns 4591ns (+ 114.0%)
But 256K seemed like overkill (over 16K), at least for plain copy I/O.
Might still be some additional benefits when doing gzip or md4, though.
/Anders
PS. We gave up on mmap already, for other reasons (high maintenance)
https://github.com/jrosdahl/ccache/commit/c358e7c801e265ce07e909d75f3f3fd4e16c7f65
More information about the ccache
mailing list