[distcc] Fwd: Repeatable .o and .so checksums with distcc

Jeff Kilpatrick kilpatrick.jeff at gmail.com
Tue Jun 29 09:54:09 MDT 2010


You are correct; I did miss some spots that I've copied into this email.
I've done a few more changes locally, and am continuing my testing.

The code paste wasn't intended to be a patch. I'm a non-programmer
(integration engineering only), so I wouldn't think any of my changes would
be up to par for an official submit.

-Jeff

On Tue, Jun 29, 2010 at 12:50 PM, Fergus Henderson <fergus at google.com>wrote:

>
> On Tue, Jun 29, 2010 at 11:15 AM, Jeff Kilpatrick <
> kilpatrick.jeff at gmail.com> wrote:
>
>> Hey Fergus.
>>
>> You are correct about the "another problem which may happen".  I applied
>> the fix you suggested, and set the temp_o and temp_i back to orig_output and
>> orig_input through the dcc_set_output() calls, and I am now getting
>> consistent checksums. I will be doing builds all through the afternoon to
>> confirm checksums match every single time.
>>
>> Thank you all so very much. You have literally saved us thousands of hours
>> in compile time, per week.
>>
>> -Jeff
>>
>> My changes:
>>
>> serve.c:
>>
>>     if (cpp_where == DCC_CPP_ON_SERVER) {
>>         if (dcc_r_many_files(in_fd, temp_dir, compr)
>> //            || dcc_set_output(argv, temp_o)
>>             || dcc_set_output(argv, orig_output)
>>             || tweak_arguments_for_server(argv, temp_dir, deps_fname,
>>                                           &dotd_target, &tweaked_argv))
>>             goto out_cleanup;
>>
>>         if ((ret = dcc_r_token_file(in_fd, "DOTI", temp_i, compr))
>>             || (ret = dcc_set_input(argv, orig_input))
>>             || (ret = dcc_set_output(argv, orig_output)))
>>
>> //            || (ret = dcc_set_input(argv, temp_i))
>> //            || (ret = dcc_set_output(argv, temp_o)))
>>             goto out_cleanup;
>
>
> When posting patches to the mailing list, please use "svn diff" or "diff
> -u".
> If that's all you've changed, I don't think your patch is correct.
> You'd need to also update the code which sends the object file back to the
> client:
>
>        if ((ret = dcc_x_file(out_fd, temp_o, "DOTO", compr, NULL)))
>             goto out_cleanup;
>
> Also, I think your change may cause problems in non-pump mode if two
> different clients attempt to compile the same object file at the same time.
>
> Cheers,
>   Fergus.
>
>
>>
>> On Tue, Jun 29, 2010 at 11:59 AM, Fergus Henderson <fergus at google.com>wrote:
>>
>>> On Tue, Jun 29, 2010 at 9:52 AM, Jeff Kilpatrick <
>>> kilpatrick.jeff at gmail.com> wrote:
>>>
>>>> Yes, I have tried both pump and regular mode, and both behave the same
>>>> way.
>>>>
>>>
>>> Well, I don't think it is exactly the same way.  In the non-pump case,
>>> distcc does the preprocessing locally, sends the ".ii" file to the server,
>>> and the server then invokes gcc with the name of the ".ii" file, e.g.
>>> /tmp/distccd_ac31c96a.ii... that is what gcc ends up embedding in the object
>>> file.
>>> In the pump case, the source file names used on the server are the same
>>> as the source file names used on the client, so the problem in your original
>>> email won't happen in that case.
>>>
>>> But there is another problem which may happen in both cases:
>>> distcc changes the command line on the server to use a different object
>>> file name, e.g. "-o ./tmp/distccd_ac31c96a.o",
>>> and gcc may embed the name of the object file in the object file.
>>> In the non-pump case, this changing of the object file name is needed to
>>> ensure that two different distcc invocations on the same server don't try to
>>> write to the same file.
>>> But in the pump case, where the compilation is being invoked in a
>>> temporary directory, I don't think it is actually necessary to change the
>>> object file name...
>>> I think the code to do that has just been inherited for historical
>>> reasons from the non-pump case.
>>> So it may be possible to modify distcc to avoid doing that in the pump
>>> case.
>>> The code which changes the object file name is in the dcc_run_job()
>>> function in src/serve.c (look in particular for the calls to
>>> dcc_set_output(), but other parts of the function would need modification
>>> too).
>>> But I guess if you're not going to be using pump mode, that wouldn't help
>>> you.
>>>
>>> You may find that the object files are more deterministic if you don't
>>> pass the "-g" flag to the compiler.
>>>
>>> Cheers,
>>>    Fergus.
>>>
>>>
>>>> A lot of the projects that I will be compiling include boost, and I
>>>> believe that the pump fails on those, and falls back to regular mode.
>>>>
>>>> -Jeff
>>>>
>>>>
>>>> On Tue, Jun 29, 2010 at 10:48 AM, Fergus Henderson <fergus at google.com>wrote:
>>>>
>>>>> Did you try using pump mode?
>>>>> That should give you a better build speed-up and may also avoid this
>>>>> issue.
>>>>>
>>>>> On Jun 29, 2010 6:32 AM, "Jeff Kilpatrick" <kilpatrick.jeff at gmail.com>
>>>>> wrote:
>>>>> > Oops, my original response went directly to Ihar, rather than to the
>>>>> list.
>>>>> >
>>>>> > ----
>>>>> >
>>>>> >
>>>>> >
>>>>> > Thank you for your response.
>>>>> >
>>>>> > We do have a tool internally that could 'scrub' the object file of
>>>>> its
>>>>> > dynamic symbols, and could be adapted for this purpose. However, I'm
>>>>> > hesitant to modify anything with the .o and .so with an external
>>>>> tool, as in
>>>>> > some cases, it may be hiding a legitimate issue. Once an exception
>>>>> makes it
>>>>> > into the code, its tempting to continue adding exceptions to fix
>>>>> issues.
>>>>> > Before you know it, you have 600 branches with unique 'fixes' to them
>>>>> :)
>>>>> >
>>>>> > Once we get a consistent checksum on the .o and .so files, they'll be
>>>>> > packaged into a .iso, which will also need to be repeatable. This can
>>>>> be
>>>>> > challenging as well, since attributes on the files can affect the
>>>>> final
>>>>> > checksum.
>>>>> >
>>>>> > -Jeff
>>>>> >
>>>>> >
>>>>> > On Tue, Jun 29, 2010 at 6:58 AM, Ihar `Philips` Filipau <
>>>>> > thephilips at gmail.com> wrote:
>>>>> >
>>>>> >> Hi Jeff!
>>>>> >>
>>>>> >> You can try to collect the check-sum only for the ELF segments which
>>>>> are
>>>>> >> actually derived from the the source code, omitting the segments
>>>>> with the
>>>>> >> extra compiler's info. I do not know any ready tool for the purpose,
>>>>> but
>>>>> >> coding something like this - print on stdout all segments except the
>>>>> >> black-listed - shouldn't be too complicated.
>>>>> >>
>>>>> >>
>>>>> >> On Tue, Jun 29, 2010 at 11:41 AM, Jeff Kilpatrick <
>>>>> >> kilpatrick.jeff at gmail.com> wrote:
>>>>> >>
>>>>> >>> Thank you for your response.
>>>>> >>>
>>>>> >>> Yes, this is the only difference in the object file. We've taken
>>>>> great
>>>>> >>> pains over the last few years, removing anything that would cause
>>>>> checksums
>>>>> >>> to mismatch.
>>>>> >>>
>>>>> >>> I will do some research myself, and talk to a few developers to see
>>>>> if
>>>>> >>> they can help me.
>>>>> >>>
>>>>> >>> Thanks
>>>>> >>> -Jeff
>>>>> >>>
>>>>> >>>
>>>>> >>> On Tue, Jun 29, 2010 at 1:32 AM, Martin Pool <mbp at sourcefrog.net>
>>>>> wrote:
>>>>> >>>
>>>>> >>>> On 29 June 2010 13:02, Jeff Kilpatrick <kilpatrick.jeff at gmail.com
>>>>> >
>>>>> >>>> wrote:
>>>>> >>>> > Hello,
>>>>> >>>> >
>>>>> >>>> > At my work, we've just begun to investigate how much of an
>>>>> impact that
>>>>> >>>> > distcc will have on our builds.
>>>>> >>>> >
>>>>> >>>> > We typically perform 200 builds a week, ranging from a thousand
>>>>> lines
>>>>> >>>> of
>>>>> >>>> > code, up to 600,000 lines of code each. Our back end build
>>>>> scripts are
>>>>> >>>> based
>>>>> >>>> > on python, and use Linux make to build. We are running VMWare
>>>>> images on
>>>>> >>>> a
>>>>> >>>> > blade cluster, and each of our three new build servers have
>>>>> 20Ghz
>>>>> >>>> processing
>>>>> >>>> > power, with 4G of RAM. Our primary build environments are loop
>>>>> back
>>>>> >>>> ISOs,
>>>>> >>>> > from a central CIFS server, and are unioned together with
>>>>> unionfs. Our
>>>>> >>>> > source code is then copied into this environment, and we proceed
>>>>> with
>>>>> >>>> our
>>>>> >>>> > build, using chroot to enter our build environment. Our 'distcc'
>>>>> >>>> machines
>>>>> >>>> > use the same loop back system, with only our OS and distcc being
>>>>> >>>> accessible.
>>>>> >>>>
>>>>> >>>> That's pretty cool.
>>>>> >>>>
>>>>> >>>> > One of the most important things for our builds, due to the
>>>>> market that
>>>>> >>>> we
>>>>> >>>> > are in, is that our builds must be reproducible, with repeatable
>>>>> >>>> md5sums on
>>>>> >>>> > our shared objects, based on the same label and same
>>>>> dependencies. In
>>>>> >>>> our
>>>>> >>>> > recent tests, we were able to take a particular build from 24
>>>>> minutes
>>>>> >>>> to 14
>>>>> >>>> > minutes, then finally 5 minutes, using distcc and adjusting our
>>>>> VMs.
>>>>> >>>> > However, when performing an md5sum on our final shared objects /
>>>>> object
>>>>> >>>> > files, the checksums change every build. We dropped down to just
>>>>> using
>>>>> >>>> g++
>>>>> >>>> > to perform our linking, all locally, but our object files are
>>>>> still
>>>>> >>>> > mismatching.
>>>>> >>>> >
>>>>> >>>> > In the object files' `objdump -s` output, it appears that an
>>>>> entry is
>>>>> >>>> being
>>>>> >>>> > made into all our object files with the following syntax
>>>>> >>>> "distccd_XXXXX",
>>>>> >>>> > with XXXXX being a seemingly random combination of characters.
>>>>> >>>>
>>>>> >>>> Hi Jeff,
>>>>> >>>>
>>>>> >>>> I think this is coming from gcc recording the input file name in
>>>>> the
>>>>> >>>> object file. distccd_xxxx.ii is the temporary file name used on
>>>>> the
>>>>> >>>> server.
>>>>> >>>>
>>>>> >>>> > In the same object file, compiled locally without distcc, we get
>>>>> a
>>>>> >>>> rather
>>>>> >>>> > generic <built-in> placeholder.
>>>>> >>>>
>>>>> >>>> I think this means it's coming from the builtin preprocessor.
>>>>> >>>>
>>>>> >>>> I probably won't have time to work on this myself but if you have
>>>>> a
>>>>> >>>> programmer interested in it there are two possible avenues:
>>>>> >>>>
>>>>> >>>> - make gcc read from a file called <built-in> in a temporary
>>>>> subdirectory
>>>>> >>>>
>>>>> >>>> - find some way to stop it recording the compiler input file name
>>>>> >>>>
>>>>> >>>> Is that the only difference in the object files? It's pretty
>>>>> common
>>>>> >>>> for compilers to also record something about the time the
>>>>> compilation
>>>>> >>>> was run or for source files to build this in, which would mean
>>>>> they
>>>>> >>>> change every time.
>>>>> >>>>
>>>>> >>>> >
>>>>> >>>> > I've reviewed the source code for distcc, and seen a few
>>>>> references to
>>>>> >>>> this
>>>>> >>>> > distccd_xxxxx. Unfortunately, I'm not a programmer, and thus am
>>>>> at a
>>>>> >>>> loss on
>>>>> >>>> > how to further troubleshoot this, or even if its possible to get
>>>>> >>>> consistent
>>>>> >>>> > checksums with distcc.
>>>>> >>>> >
>>>>> >>>> >
>>>>> >>>> > Versions
>>>>> >>>> > =======
>>>>> >>>> > g++ (Gentoo 4.3.2-r4 p1.8, pie-10.1.5) 4.3.2
>>>>> >>>> >
>>>>> >>>> > distcc 3.1 i686-pc-linux-gnu
>>>>> >>>> > (protocols 1, 2 and 3) (default port 3632)
>>>>> >>>> > built Mar 29 2010 10:55:35
>>>>> >>>> >
>>>>> >>>> > Kernel: 2.6.9-89.ELsmp
>>>>> >>>> >
>>>>> >>>> > Command being issued:
>>>>> >>>> > DISTCC_VERBOSE=1 make -j24 CXX="distcc"
>>>>> >>>> >
>>>>> >>>> > Here's the partial output of objdump -s:
>>>>> >>>> > 04f0 00030000 5f6d6f76 655f636f 6e737472 ...._move_constr
>>>>> >>>> > 0500 7563745f 66776b2e 68000300 00474454 uct_fwk.h....GDT
>>>>> >>>> > 0510 79706573 2e68000a 00007365 72646566 ypes.h....serdef
>>>>> >>>> > 0520 732e6800 01000073 75666669 782e6870 s.h....suffix.hp
>>>>> >>>> > 0530 70000b00 00646973 74636364 5f616333 p....distccd_ac3
>>>>> >>>> > 0540 31633936 612e6969 000c0000 61646c5f 1c96a.ii....adl_
>>>>> >>>> > 0550 62617272 6965722e 68707000 0d000062 barrier.hpp....b
>>>>> >>>> > 0560 6f6f6c5f 6677642e 68707000 0e000069 ool_fwd.hpp....i
>>>>> >>>> > 0570 6e746567 72616c5f 635f7461 672e6870 ntegral_c_tag.hp
>>>>> >>>> > 0580 70000e00 00766f69 645f6677 642e6870 p....void_fwd.hp
>>>>> >>>> >
>>>>> >>>> > Thank you for reviewing my issue.
>>>>> >>>> >
>>>>> >>>> > -Jeff
>>>>> >>>> >
>>>>> >>>> > __
>>>>> >>>> > distcc mailing list http://distcc.samba.org/
>>>>> >>>> > To unsubscribe or change options:
>>>>> >>>> > https://lists.samba.org/mailman/listinfo/distcc
>>>>> >>>> >
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> --
>>>>> >>>> Martin
>>>>> >>>>
>>>>> >>>
>>>>> >>>
>>>>> >>> __
>>>>> >>> distcc mailing list http://distcc.samba.org/
>>>>> >>> To unsubscribe or change options:
>>>>> >>> https://lists.samba.org/mailman/listinfo/distcc
>>>>> >>>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> --
>>>>> >> Don't walk behind me, I may not lead.
>>>>> >> Don't walk in front of me, I may not follow.
>>>>> >> Just walk beside me and be my friend.
>>>>> >> -- Albert Camus (attributed to)
>>>>> >>
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Fergus Henderson <fergus at google.com>
>>>
>>
>>
>
>
> --
> Fergus Henderson <fergus at google.com>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.samba.org/pipermail/distcc/attachments/20100629/3d7d5ef6/attachment-0001.html>


More information about the distcc mailing list