[ccache] Buffer size for IO operations is too small

Sun Apr 3 20:59:29 UTC 2016

Anders Björklund wrote:
> Michael Kolomeytsev wrote:
>> I've discovered that there is too small buffer size for IO in ccache: 16k
>> or 10k
>> (in hash_fd, copy_fd, copy_file).
>>
>
> But your observations are very interesting, and please post
> more if you have it. Would also be nice to have some follow-up
> on the observation about ccache problems with multiple cores:
> https://github.com/jrosdahl/ccache/issues/54 (also on OS X)
>
> I'm thinking that hash and copy could do with different macros...

Actually three macros, hash, compress/decompress and plain old copy.
Thought I'd move the "copy" case aside, away from the other buffers...

You'd think that copying a file would be a simple thing to do, right ?
Actually, on some systems like Windows or Mac OS X it is. But on Linux:

Found this interesting blog post, that came with some benchmarks too:
http://blog.plenz.com/2014-04/so-you-want-to-write-to-a-file-real-fast.html

So the first thing to do would be to make the I/O buffer size into a
whole multiple of the block size, that is: 16384 instead of 10240.
Avoids having to do partial page copies later. And then allocating the
buffer in kernel space instead of user space sounded like a good idea.

But having to look for various OS/kernel versions of sendfile()? Eww.
Might as well stick with "splice()", since other main systems like
have solutions already: Win32 have CopyFile and OS X has copyfile.
And doing some "advise/allocate" sounded easy, but had pitfalls too.

Here is the end result, in case anyone is interested in a preview:
https://github.com/jrosdahl/ccache/compare/master...itensionanders:uncompressed

It sounded like a good idea, but needs some actual benchmarks to see
whether it was actually worth it. Probably should check st_blksize too.

The actual I/O can probably be made twice as fast (e.g. for a 1M file)
Question is whether it makes any real impact of the ccache run time ?

pipe+splice + advices + trunc   1175ns  1283ns  1290ns
read+write 4bs                  1537ns  2126ns  2210ns  (+ 30.8%)
read+write 10k                  2334ns  2356ns  2668ns  (+ 98.6%)
read+write bs                   2515ns  2692ns  4591ns  (+ 114.0%)

But 256K seemed like overkill (over 16K), at least for plain copy I/O.
Might still be some additional benefits when doing gzip or md4, though.

/Anders

PS. We gave up on mmap already, for other reasons (high maintenance)
https://github.com/jrosdahl/ccache/commit/c358e7c801e265ce07e909d75f3f3fd4e16c7f65