[ccache] Corrupt objects from three colliding compiles

Wilson Snyder wsnyder at wsnyder.org
Thu May 6 11:14:17 MDT 2010


I'm hitting a NFS issue causing bad caching.  (And yes, we have
no_subtree_check.)

This is a bug present in 2.4, but it is also present in
trunk b964c173cf493a1c0c0749a9c6a39bc07936507b which is used
for the below diagnosis.

The issue seems to occur only when there are three compiles
on different systems all compiling the same file at the same
time.  Here's the C files:

$ find . -name INV.c -exec ls -la --time-style=full-iso {} \;
-rw-rw-r-- 1 wsnyder users 405 2010-05-06 11:38:39.000000000 -0400 ./dir02/INV.c
-rw-rw-r-- 1 wsnyder users 405 2010-05-06 11:48:57.000000000 -0400 ./dir03/INV.c
-rw-rw-r-- 1 wsnyder users 405 2010-05-06 11:43:08.000000000 -0400 ./dir04/INV.c

They got compiled, but one got truncated

$ find . -name INV.o -exec ls -la --time-style=full-iso {} \;
-rw-rw-r-- 1 wsnyder users 916 2010-05-06 11:56:44.000000000 -0400 ./dir02/INV.o
-rw-rw-r-- 1 wsnyder users 916 2010-05-06 11:56:44.000000000 -0400 ./dir03/INV.o
-rw-rw-r-- 1 wsnyder users   0 2010-05-06 11:56:44.000000000 -0400 ./dir04/INV.o

Note the timestamps (unfortunately this isn't a file system
with sub-second resolution!)

The cache files:

$ ls -la --time-style=full-iso $CCACHE_DIR/2/b/2/eedf0591498af8e5f3a*
-rw-rw-r-- 1 wsnyder users 916 2010-05-06 11:56:44.000000000 -0400 $CCACHE_DIR/2/b/2/eedf0591498af8e5f3a27fcc89c4f-69975.o

$ ls -la --time-style=full-iso $CCACHE_DIR/0/3/8/f19fe0a63fc088a9c57*
-rw-rw-r-- 1 wsnyder users 874 2010-05-06 11:56:44.000000000 -0400 $CCACHE_DIR/0/3/8/f19fe0a63fc088a9c570845a0a4eb-584.manifest

To debug this further I added hostname and timestamps to the
log files (earlier email sent this patch), otherwise I
couldn't know what the ordering was.  This gave three
reports:

[1273161404.432174,2424 ] === 2010-05-06 11:56:44 ===
[1273161404.432636,2424 ] Hostname: maserati
[1273161404.432643,2424 ] Cwd: /nfs/compiling/dir02
[1273161404.432646,2424 ] Source file: INV.c
[1273161404.432648,2424 ] Object file: INV.o
[1273161404.432657,2424 ] Trying direct lookup
[1273161404.439730,2424 ] Looking for object file hash in /nfs/ccache-dir/0/3/8/f19fe0a63fc088a9c570845a0a4eb-584.manifest
[1273161404.440126,2424 ] Did not find object file hash in manifest
[1273161404.440134,2424 ] Running preprocessor
[1273161404.487509,2424 ] Got object file hash from preprocessor
[1273161404.492494,2424 ] Object file /nfs/ccache-dir/2/b/2/eedf0591498af8e5f3a27fcc89c4f-69975.o not in cache
[1273161404.492522,2424 ] Running real compiler
[1273161404.576412,2424 ] Stored in cache: /nfs/ccache-dir/2/b/2/eedf0591498af8e5f3a27fcc89c4f-69975.o
[1273161404.596098,2424 ] Created INV.o from /nfs/ccache-dir/2/b/2/eedf0591498af8e5f3a27fcc89c4f-69975.o
[1273161404.638443,2424 ] Added object file hash to /nfs/ccache-dir/0/3/8/f19fe0a63fc088a9c570845a0a4eb-584.manifest

[1273161404.469188,2084 ] === 2010-05-06 11:56:44 ===
[1273161404.469515,2084 ] Hostname: bird07
[1273161404.469519,2084 ] Cwd: /nfs/compiling/dir03
[1273161404.469521,2084 ] Source file: INV.c
[1273161404.469523,2084 ] Object file: INV.o
[1273161404.469532,2084 ] Trying direct lookup
[1273161404.471093,2084 ] Looking for object file hash in /nfs/ccache-dir/0/3/8/f19fe0a63fc088a9c570845a0a4eb-584.manifest
[1273161404.475427,2084 ] Did not find object file hash in manifest
[1273161404.475434,2084 ] Running preprocessor
[1273161404.523658,2084 ] Got object file hash from preprocessor
[1273161404.542730,2084 ] Created INV.o from /nfs/ccache-dir/2/b/2/eedf0591498af8e5f3a27fcc89c4f-69975.o
[1273161404.569955,2084 ] Added object file hash to /nfs/ccache-dir/0/3/8/f19fe0a63fc088a9c570845a0a4eb-584.manifest
[1273161404.570387,2084 ] Succeded getting cached result

[1273161404.511123,21936] === 2010-05-06 11:56:44 ===
[1273161404.511506,21936] Hostname: bat05
[1273161404.511510,21936] Cwd: /nfs/compiling/dir04
[1273161404.511513,21936] Source file: INV.c
[1273161404.511515,21936] Object file: INV.o
[1273161404.511523,21936] Trying direct lookup
[1273161404.514058,21936] Looking for object file hash in /nfs/ccache-dir/0/3/8/f19fe0a63fc088a9c570845a0a4eb-584.manifest
[1273161404.517648,21936] Did not find object file hash in manifest
[1273161404.517654,21936] Running preprocessor
[1273161404.564688,21936] Got object file hash from preprocessor
[1273161404.588776,21936] Created INV.o from /nfs/ccache-dir/2/b/2/eedf0591498af8e5f3a27fcc89c4f-69975.o   ***BAD OBJECT***
[1273161404.630619,21936] Added object file hash to /nfs/ccache-dir/0/3/8/f19fe0a63fc088a9c570845a0a4eb-584.manifest
[1273161404.634864,21936] Succeded getting cached result

The next step assumes our times are close across systems; they should be within about 12ms.  Here's the sorted view

[1273161404.432174,2424 ] === 2010-05-06 11:56:44 ===
[1273161404.432636,2424 ] Hostname: maserati
[1273161404.432643,2424 ] Cwd: /nfs/compiling/dir02
[1273161404.432646,2424 ] Source file: INV.c
[1273161404.432648,2424 ] Object file: INV.o
[1273161404.432657,2424 ] Trying direct lookup
[1273161404.439730,2424 ] Looking for object file hash in /nfs/ccache-dir/0/3/8/f19fe0a63fc088a9c570845a0a4eb-584.manifest
[1273161404.440126,2424 ] Did not find object file hash in manifest
[1273161404.440134,2424 ] Running preprocessor
[1273161404.469188,2084 ] === 2010-05-06 11:56:44 ===
[1273161404.469515,2084 ] Hostname: bird07
[1273161404.469519,2084 ] Cwd: /nfs/compiling/dir03
[1273161404.469521,2084 ] Source file: INV.c
[1273161404.469523,2084 ] Object file: INV.o
[1273161404.469532,2084 ] Trying direct lookup
[1273161404.471093,2084 ] Looking for object file hash in /nfs/ccache-dir/0/3/8/f19fe0a63fc088a9c570845a0a4eb-584.manifest
[1273161404.475427,2084 ] Did not find object file hash in manifest
[1273161404.475434,2084 ] Running preprocessor
[1273161404.487509,2424 ] Got object file hash from preprocessor
[1273161404.492494,2424 ] Object file /nfs/ccache-dir/2/b/2/eedf0591498af8e5f3a27fcc89c4f-69975.o not in cache
[1273161404.492522,2424 ] Running real compiler
[1273161404.511123,21936] === 2010-05-06 11:56:44 ===
[1273161404.511506,21936] Hostname: bat05
[1273161404.511510,21936] Cwd: /nfs/compiling/dir04
[1273161404.511513,21936] Source file: INV.c
[1273161404.511515,21936] Object file: INV.o
[1273161404.511523,21936] Trying direct lookup
[1273161404.514058,21936] Looking for object file hash in /nfs/ccache-dir/0/3/8/f19fe0a63fc088a9c570845a0a4eb-584.manifest
[1273161404.517648,21936] Did not find object file hash in manifest
[1273161404.517654,21936] Running preprocessor
[1273161404.523658,2084 ] Got object file hash from preprocessor
[1273161404.542730,2084 ] Created INV.o from /nfs/ccache-dir/2/b/2/eedf0591498af8e5f3a27fcc89c4f-69975.o
[1273161404.564688,21936] Got object file hash from preprocessor
[1273161404.569955,2084 ] Added object file hash to /nfs/ccache-dir/0/3/8/f19fe0a63fc088a9c570845a0a4eb-584.manifest
[1273161404.570387,2084 ] Succeded getting cached result
[1273161404.576412,2424 ] Stored in cache: /nfs/ccache-dir/2/b/2/eedf0591498af8e5f3a27fcc89c4f-69975.o
[1273161404.588776,21936] Created INV.o from /nfs/ccache-dir/2/b/2/eedf0591498af8e5f3a27fcc89c4f-69975.o   ***BAD OBJECT***
[1273161404.596098,2424 ] Created INV.o from /nfs/ccache-dir/2/b/2/eedf0591498af8e5f3a27fcc89c4f-69975.o
[1273161404.630619,21936] Added object file hash to /nfs/ccache-dir/0/3/8/f19fe0a63fc088a9c570845a0a4eb-584.manifest
[1273161404.634864,21936] Succeded getting cached result
[1273161404.638443,2424 ] Added object file hash to /nfs/ccache-dir/0/3/8/f19fe0a63fc088a9c570845a0a4eb-584.manifest


It looks like the third job's read of the cache is stuck in
between the first's write and the second's.  Note that times
may be off by >8ms, this would put the bad object read right
on top of the object update by the second compile when it
hits the NFS server.

Ideas?


More information about the ccache mailing list