[distcc] Fwd: Repeatable .o and .so checksums with distcc

Jeff Kilpatrick kilpatrick.jeff at gmail.com
Tue Jun 29 13:03:34 MDT 2010


I've still havn't cracked this riddle. Here's some additional debugging
information:

distccd[22108] (dcc_scan_args) found input file
"/proj/Platform/SGF/Source/Common/BankBill.cpp"
distccd[22108] (dcc_scan_args) found object/output file
"objs/Release/BankBill.o"
distccd[22108] compile from BankBill.cpp to BankBill.o
distccd[22108] (dcc_run_job) temp input file (null)
distccd[22108] (dcc_run_job) original input file
/proj/Platform/SGF/Source/Common/BankBill.cpp
distccd[22108] (dcc_input_tmpnam) input file
/proj/Platform/SGF/Source/Common/BankBill.cpp
distccd[22108] (dcc_run_job) temp input file /tmp/distccd_42d13927.ii
distccd[22108] (dcc_r_token_int) got DOTI000c8925
distccd[22108] (dcc_r_file) received 821541 bytes to file
/tmp/distccd_42d13927.ii
distccd[22108] (dcc_r_file_timed) 821541 bytes received in 0.002498s, rate
321171kB/s
distccd[22108] (dcc_set_input) changed input from
"/proj/Platform/SGF/Source/Common/BankBill.cpp" to
"/tmp/distccd_42d13927.ii"
distccd[22108] (dcc_set_input) command after: cc -fexceptions -frtti -fPIC
-fno-defer-pop -fno-strict-aliasing -Wall -Wno-unknown-pragmas -Winvalid-pch
-Werror -O3 -g -fvisibility=hidden -c /tmp/distccd_42d13927.ii -o
objs/Release/BankBill.o
distccd[22108] (dcc_set_output) changed output from
"objs/Release/BankBill.o" to "/tmp/distccd_4dc23927.o"
distccd[22108] (dcc_set_output) command after: cc -fexceptions -frtti -fPIC
-fno-defer-pop -fno-strict-aliasing -Wall -Wno-unknown-pragmas -Winvalid-pch
-Werror -O3 -g -fvisibility=hidden -c /tmp/distccd_42d13927.ii -o
/tmp/distccd_4dc23927.o
distccd[22108] (dcc_run_job) 2. temp input file /tmp/distccd_42d13927.ii
distccd[22108] (dcc_check_compiler_masq) /usr/bin/cc is not a symlink
distccd[22108] (dcc_spawn_child) forking to execute: cc -fexceptions -frtti
-fPIC -fno-defer-pop -fno-strict-aliasing -Wall -Wno-unknown-pragmas
-Winvalid-pch -Werror -O3 -g -fvisibility=hidden -c /tmp/distccd_42d13927.ii
-o /tmp/distccd_4dc23927.o

It seems as if I change the temp_o or temp_i, then it can't find the files.
In my case, I don't need to protect against having unique files. Is it
possible to keep the full paths to the files the same, just on the other
server? I would imagine I would have to set TMPDIR to /, and just update all
the temp_o and temp_i references and keeo them to the orig_output /
orig_input.

Would there be any more to this, or am I completely missing something?

-Jeff

On Tue, Jun 29, 2010 at 12:54 PM, Jeff Kilpatrick <kilpatrick.jeff at gmail.com
> wrote:

> You are correct; I did miss some spots that I've copied into this email.
> I've done a few more changes locally, and am continuing my testing.
>
> The code paste wasn't intended to be a patch. I'm a non-programmer
> (integration engineering only), so I wouldn't think any of my changes would
> be up to par for an official submit.
>
> -Jeff
>
>
> On Tue, Jun 29, 2010 at 12:50 PM, Fergus Henderson <fergus at google.com>wrote:
>
>>
>> On Tue, Jun 29, 2010 at 11:15 AM, Jeff Kilpatrick <
>> kilpatrick.jeff at gmail.com> wrote:
>>
>>> Hey Fergus.
>>>
>>> You are correct about the "another problem which may happen".  I applied
>>> the fix you suggested, and set the temp_o and temp_i back to orig_output and
>>> orig_input through the dcc_set_output() calls, and I am now getting
>>> consistent checksums. I will be doing builds all through the afternoon to
>>> confirm checksums match every single time.
>>>
>>> Thank you all so very much. You have literally saved us thousands of
>>> hours in compile time, per week.
>>>
>>> -Jeff
>>>
>>> My changes:
>>>
>>> serve.c:
>>>
>>>     if (cpp_where == DCC_CPP_ON_SERVER) {
>>>         if (dcc_r_many_files(in_fd, temp_dir, compr)
>>> //            || dcc_set_output(argv, temp_o)
>>>             || dcc_set_output(argv, orig_output)
>>>             || tweak_arguments_for_server(argv, temp_dir, deps_fname,
>>>                                           &dotd_target, &tweaked_argv))
>>>             goto out_cleanup;
>>>
>>>         if ((ret = dcc_r_token_file(in_fd, "DOTI", temp_i, compr))
>>>             || (ret = dcc_set_input(argv, orig_input))
>>>             || (ret = dcc_set_output(argv, orig_output)))
>>>
>>> //            || (ret = dcc_set_input(argv, temp_i))
>>> //            || (ret = dcc_set_output(argv, temp_o)))
>>>             goto out_cleanup;
>>
>>
>> When posting patches to the mailing list, please use "svn diff" or "diff
>> -u".
>> If that's all you've changed, I don't think your patch is correct.
>> You'd need to also update the code which sends the object file back to the
>> client:
>>
>>        if ((ret = dcc_x_file(out_fd, temp_o, "DOTO", compr, NULL)))
>>             goto out_cleanup;
>>
>> Also, I think your change may cause problems in non-pump mode if two
>> different clients attempt to compile the same object file at the same time.
>>
>> Cheers,
>>   Fergus.
>>
>>
>>>
>>> On Tue, Jun 29, 2010 at 11:59 AM, Fergus Henderson <fergus at google.com>wrote:
>>>
>>>> On Tue, Jun 29, 2010 at 9:52 AM, Jeff Kilpatrick <
>>>> kilpatrick.jeff at gmail.com> wrote:
>>>>
>>>>> Yes, I have tried both pump and regular mode, and both behave the same
>>>>> way.
>>>>>
>>>>
>>>> Well, I don't think it is exactly the same way.  In the non-pump case,
>>>> distcc does the preprocessing locally, sends the ".ii" file to the server,
>>>> and the server then invokes gcc with the name of the ".ii" file, e.g.
>>>> /tmp/distccd_ac31c96a.ii... that is what gcc ends up embedding in the object
>>>> file.
>>>> In the pump case, the source file names used on the server are the same
>>>> as the source file names used on the client, so the problem in your original
>>>> email won't happen in that case.
>>>>
>>>> But there is another problem which may happen in both cases:
>>>> distcc changes the command line on the server to use a different object
>>>> file name, e.g. "-o ./tmp/distccd_ac31c96a.o",
>>>> and gcc may embed the name of the object file in the object file.
>>>> In the non-pump case, this changing of the object file name is needed to
>>>> ensure that two different distcc invocations on the same server don't try to
>>>> write to the same file.
>>>> But in the pump case, where the compilation is being invoked in a
>>>> temporary directory, I don't think it is actually necessary to change the
>>>> object file name...
>>>> I think the code to do that has just been inherited for historical
>>>> reasons from the non-pump case.
>>>> So it may be possible to modify distcc to avoid doing that in the pump
>>>> case.
>>>> The code which changes the object file name is in the dcc_run_job()
>>>> function in src/serve.c (look in particular for the calls to
>>>> dcc_set_output(), but other parts of the function would need modification
>>>> too).
>>>> But I guess if you're not going to be using pump mode, that wouldn't
>>>> help you.
>>>>
>>>> You may find that the object files are more deterministic if you don't
>>>> pass the "-g" flag to the compiler.
>>>>
>>>> Cheers,
>>>>    Fergus.
>>>>
>>>>
>>>>> A lot of the projects that I will be compiling include boost, and I
>>>>> believe that the pump fails on those, and falls back to regular mode.
>>>>>
>>>>> -Jeff
>>>>>
>>>>>
>>>>> On Tue, Jun 29, 2010 at 10:48 AM, Fergus Henderson <fergus at google.com>wrote:
>>>>>
>>>>>> Did you try using pump mode?
>>>>>> That should give you a better build speed-up and may also avoid this
>>>>>> issue.
>>>>>>
>>>>>> On Jun 29, 2010 6:32 AM, "Jeff Kilpatrick" <kilpatrick.jeff at gmail.com>
>>>>>> wrote:
>>>>>> > Oops, my original response went directly to Ihar, rather than to the
>>>>>> list.
>>>>>> >
>>>>>> > ----
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > Thank you for your response.
>>>>>> >
>>>>>> > We do have a tool internally that could 'scrub' the object file of
>>>>>> its
>>>>>> > dynamic symbols, and could be adapted for this purpose. However, I'm
>>>>>> > hesitant to modify anything with the .o and .so with an external
>>>>>> tool, as in
>>>>>> > some cases, it may be hiding a legitimate issue. Once an exception
>>>>>> makes it
>>>>>> > into the code, its tempting to continue adding exceptions to fix
>>>>>> issues.
>>>>>> > Before you know it, you have 600 branches with unique 'fixes' to
>>>>>> them :)
>>>>>> >
>>>>>> > Once we get a consistent checksum on the .o and .so files, they'll
>>>>>> be
>>>>>> > packaged into a .iso, which will also need to be repeatable. This
>>>>>> can be
>>>>>> > challenging as well, since attributes on the files can affect the
>>>>>> final
>>>>>> > checksum.
>>>>>> >
>>>>>> > -Jeff
>>>>>> >
>>>>>> >
>>>>>> > On Tue, Jun 29, 2010 at 6:58 AM, Ihar `Philips` Filipau <
>>>>>> > thephilips at gmail.com> wrote:
>>>>>> >
>>>>>> >> Hi Jeff!
>>>>>> >>
>>>>>> >> You can try to collect the check-sum only for the ELF segments
>>>>>> which are
>>>>>> >> actually derived from the the source code, omitting the segments
>>>>>> with the
>>>>>> >> extra compiler's info. I do not know any ready tool for the
>>>>>> purpose, but
>>>>>> >> coding something like this - print on stdout all segments except
>>>>>> the
>>>>>> >> black-listed - shouldn't be too complicated.
>>>>>> >>
>>>>>> >>
>>>>>> >> On Tue, Jun 29, 2010 at 11:41 AM, Jeff Kilpatrick <
>>>>>> >> kilpatrick.jeff at gmail.com> wrote:
>>>>>> >>
>>>>>> >>> Thank you for your response.
>>>>>> >>>
>>>>>> >>> Yes, this is the only difference in the object file. We've taken
>>>>>> great
>>>>>> >>> pains over the last few years, removing anything that would cause
>>>>>> checksums
>>>>>> >>> to mismatch.
>>>>>> >>>
>>>>>> >>> I will do some research myself, and talk to a few developers to
>>>>>> see if
>>>>>> >>> they can help me.
>>>>>> >>>
>>>>>> >>> Thanks
>>>>>> >>> -Jeff
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> On Tue, Jun 29, 2010 at 1:32 AM, Martin Pool <mbp at sourcefrog.net>
>>>>>> wrote:
>>>>>> >>>
>>>>>> >>>> On 29 June 2010 13:02, Jeff Kilpatrick <
>>>>>> kilpatrick.jeff at gmail.com>
>>>>>> >>>> wrote:
>>>>>> >>>> > Hello,
>>>>>> >>>> >
>>>>>> >>>> > At my work, we've just begun to investigate how much of an
>>>>>> impact that
>>>>>> >>>> > distcc will have on our builds.
>>>>>> >>>> >
>>>>>> >>>> > We typically perform 200 builds a week, ranging from a thousand
>>>>>> lines
>>>>>> >>>> of
>>>>>> >>>> > code, up to 600,000 lines of code each. Our back end build
>>>>>> scripts are
>>>>>> >>>> based
>>>>>> >>>> > on python, and use Linux make to build. We are running VMWare
>>>>>> images on
>>>>>> >>>> a
>>>>>> >>>> > blade cluster, and each of our three new build servers have
>>>>>> 20Ghz
>>>>>> >>>> processing
>>>>>> >>>> > power, with 4G of RAM. Our primary build environments are loop
>>>>>> back
>>>>>> >>>> ISOs,
>>>>>> >>>> > from a central CIFS server, and are unioned together with
>>>>>> unionfs. Our
>>>>>> >>>> > source code is then copied into this environment, and we
>>>>>> proceed with
>>>>>> >>>> our
>>>>>> >>>> > build, using chroot to enter our build environment. Our
>>>>>> 'distcc'
>>>>>> >>>> machines
>>>>>> >>>> > use the same loop back system, with only our OS and distcc
>>>>>> being
>>>>>> >>>> accessible.
>>>>>> >>>>
>>>>>> >>>> That's pretty cool.
>>>>>> >>>>
>>>>>> >>>> > One of the most important things for our builds, due to the
>>>>>> market that
>>>>>> >>>> we
>>>>>> >>>> > are in, is that our builds must be reproducible, with
>>>>>> repeatable
>>>>>> >>>> md5sums on
>>>>>> >>>> > our shared objects, based on the same label and same
>>>>>> dependencies. In
>>>>>> >>>> our
>>>>>> >>>> > recent tests, we were able to take a particular build from 24
>>>>>> minutes
>>>>>> >>>> to 14
>>>>>> >>>> > minutes, then finally 5 minutes, using distcc and adjusting our
>>>>>> VMs.
>>>>>> >>>> > However, when performing an md5sum on our final shared objects
>>>>>> / object
>>>>>> >>>> > files, the checksums change every build. We dropped down to
>>>>>> just using
>>>>>> >>>> g++
>>>>>> >>>> > to perform our linking, all locally, but our object files are
>>>>>> still
>>>>>> >>>> > mismatching.
>>>>>> >>>> >
>>>>>> >>>> > In the object files' `objdump -s` output, it appears that an
>>>>>> entry is
>>>>>> >>>> being
>>>>>> >>>> > made into all our object files with the following syntax
>>>>>> >>>> "distccd_XXXXX",
>>>>>> >>>> > with XXXXX being a seemingly random combination of characters.
>>>>>> >>>>
>>>>>> >>>> Hi Jeff,
>>>>>> >>>>
>>>>>> >>>> I think this is coming from gcc recording the input file name in
>>>>>> the
>>>>>> >>>> object file. distccd_xxxx.ii is the temporary file name used on
>>>>>> the
>>>>>> >>>> server.
>>>>>> >>>>
>>>>>> >>>> > In the same object file, compiled locally without distcc, we
>>>>>> get a
>>>>>> >>>> rather
>>>>>> >>>> > generic <built-in> placeholder.
>>>>>> >>>>
>>>>>> >>>> I think this means it's coming from the builtin preprocessor.
>>>>>> >>>>
>>>>>> >>>> I probably won't have time to work on this myself but if you have
>>>>>> a
>>>>>> >>>> programmer interested in it there are two possible avenues:
>>>>>> >>>>
>>>>>> >>>> - make gcc read from a file called <built-in> in a temporary
>>>>>> subdirectory
>>>>>> >>>>
>>>>>> >>>> - find some way to stop it recording the compiler input file name
>>>>>> >>>>
>>>>>> >>>> Is that the only difference in the object files? It's pretty
>>>>>> common
>>>>>> >>>> for compilers to also record something about the time the
>>>>>> compilation
>>>>>> >>>> was run or for source files to build this in, which would mean
>>>>>> they
>>>>>> >>>> change every time.
>>>>>> >>>>
>>>>>> >>>> >
>>>>>> >>>> > I've reviewed the source code for distcc, and seen a few
>>>>>> references to
>>>>>> >>>> this
>>>>>> >>>> > distccd_xxxxx. Unfortunately, I'm not a programmer, and thus am
>>>>>> at a
>>>>>> >>>> loss on
>>>>>> >>>> > how to further troubleshoot this, or even if its possible to
>>>>>> get
>>>>>> >>>> consistent
>>>>>> >>>> > checksums with distcc.
>>>>>> >>>> >
>>>>>> >>>> >
>>>>>> >>>> > Versions
>>>>>> >>>> > =======
>>>>>> >>>> > g++ (Gentoo 4.3.2-r4 p1.8, pie-10.1.5) 4.3.2
>>>>>> >>>> >
>>>>>> >>>> > distcc 3.1 i686-pc-linux-gnu
>>>>>> >>>> > (protocols 1, 2 and 3) (default port 3632)
>>>>>> >>>> > built Mar 29 2010 10:55:35
>>>>>> >>>> >
>>>>>> >>>> > Kernel: 2.6.9-89.ELsmp
>>>>>> >>>> >
>>>>>> >>>> > Command being issued:
>>>>>> >>>> > DISTCC_VERBOSE=1 make -j24 CXX="distcc"
>>>>>> >>>> >
>>>>>> >>>> > Here's the partial output of objdump -s:
>>>>>> >>>> > 04f0 00030000 5f6d6f76 655f636f 6e737472 ...._move_constr
>>>>>> >>>> > 0500 7563745f 66776b2e 68000300 00474454 uct_fwk.h....GDT
>>>>>> >>>> > 0510 79706573 2e68000a 00007365 72646566 ypes.h....serdef
>>>>>> >>>> > 0520 732e6800 01000073 75666669 782e6870 s.h....suffix.hp
>>>>>> >>>> > 0530 70000b00 00646973 74636364 5f616333 p....distccd_ac3
>>>>>> >>>> > 0540 31633936 612e6969 000c0000 61646c5f 1c96a.ii....adl_
>>>>>> >>>> > 0550 62617272 6965722e 68707000 0d000062 barrier.hpp....b
>>>>>> >>>> > 0560 6f6f6c5f 6677642e 68707000 0e000069 ool_fwd.hpp....i
>>>>>> >>>> > 0570 6e746567 72616c5f 635f7461 672e6870 ntegral_c_tag.hp
>>>>>> >>>> > 0580 70000e00 00766f69 645f6677 642e6870 p....void_fwd.hp
>>>>>> >>>> >
>>>>>> >>>> > Thank you for reviewing my issue.
>>>>>> >>>> >
>>>>>> >>>> > -Jeff
>>>>>> >>>> >
>>>>>> >>>> > __
>>>>>> >>>> > distcc mailing list http://distcc.samba.org/
>>>>>> >>>> > To unsubscribe or change options:
>>>>>> >>>> > https://lists.samba.org/mailman/listinfo/distcc
>>>>>> >>>> >
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>> --
>>>>>> >>>> Martin
>>>>>> >>>>
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> __
>>>>>> >>> distcc mailing list http://distcc.samba.org/
>>>>>> >>> To unsubscribe or change options:
>>>>>> >>> https://lists.samba.org/mailman/listinfo/distcc
>>>>>> >>>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> --
>>>>>> >> Don't walk behind me, I may not lead.
>>>>>> >> Don't walk in front of me, I may not follow.
>>>>>> >> Just walk beside me and be my friend.
>>>>>> >> -- Albert Camus (attributed to)
>>>>>> >>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Fergus Henderson <fergus at google.com>
>>>>
>>>
>>>
>>
>>
>> --
>> Fergus Henderson <fergus at google.com>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.samba.org/pipermail/distcc/attachments/20100629/08da8920/attachment-0001.html>


More information about the distcc mailing list