[distcc] Fwd: Repeatable .o and .so checksums with distcc

Fergus Henderson fergus at google.com
Tue Jun 29 09:50:04 MDT 2010


On Tue, Jun 29, 2010 at 11:15 AM, Jeff Kilpatrick <kilpatrick.jeff at gmail.com
> wrote:

> Hey Fergus.
>
> You are correct about the "another problem which may happen".  I applied
> the fix you suggested, and set the temp_o and temp_i back to orig_output and
> orig_input through the dcc_set_output() calls, and I am now getting
> consistent checksums. I will be doing builds all through the afternoon to
> confirm checksums match every single time.
>
> Thank you all so very much. You have literally saved us thousands of hours
> in compile time, per week.
>
> -Jeff
>
> My changes:
>
> serve.c:
>
>     if (cpp_where == DCC_CPP_ON_SERVER) {
>         if (dcc_r_many_files(in_fd, temp_dir, compr)
> //            || dcc_set_output(argv, temp_o)
>             || dcc_set_output(argv, orig_output)
>             || tweak_arguments_for_server(argv, temp_dir, deps_fname,
>                                           &dotd_target, &tweaked_argv))
>             goto out_cleanup;
>
>         if ((ret = dcc_r_token_file(in_fd, "DOTI", temp_i, compr))
>             || (ret = dcc_set_input(argv, orig_input))
>             || (ret = dcc_set_output(argv, orig_output)))
>
> //            || (ret = dcc_set_input(argv, temp_i))
> //            || (ret = dcc_set_output(argv, temp_o)))
>             goto out_cleanup;


When posting patches to the mailing list, please use "svn diff" or "diff
-u".
If that's all you've changed, I don't think your patch is correct.
You'd need to also update the code which sends the object file back to the
client:

       if ((ret = dcc_x_file(out_fd, temp_o, "DOTO", compr, NULL)))
            goto out_cleanup;

Also, I think your change may cause problems in non-pump mode if two
different clients attempt to compile the same object file at the same time.

Cheers,
  Fergus.


>
> On Tue, Jun 29, 2010 at 11:59 AM, Fergus Henderson <fergus at google.com>wrote:
>
>> On Tue, Jun 29, 2010 at 9:52 AM, Jeff Kilpatrick <
>> kilpatrick.jeff at gmail.com> wrote:
>>
>>> Yes, I have tried both pump and regular mode, and both behave the same
>>> way.
>>>
>>
>> Well, I don't think it is exactly the same way.  In the non-pump case,
>> distcc does the preprocessing locally, sends the ".ii" file to the server,
>> and the server then invokes gcc with the name of the ".ii" file, e.g.
>> /tmp/distccd_ac31c96a.ii... that is what gcc ends up embedding in the object
>> file.
>> In the pump case, the source file names used on the server are the same as
>> the source file names used on the client, so the problem in your original
>> email won't happen in that case.
>>
>> But there is another problem which may happen in both cases:
>> distcc changes the command line on the server to use a different object
>> file name, e.g. "-o ./tmp/distccd_ac31c96a.o",
>> and gcc may embed the name of the object file in the object file.
>> In the non-pump case, this changing of the object file name is needed to
>> ensure that two different distcc invocations on the same server don't try to
>> write to the same file.
>> But in the pump case, where the compilation is being invoked in a
>> temporary directory, I don't think it is actually necessary to change the
>> object file name...
>> I think the code to do that has just been inherited for historical reasons
>> from the non-pump case.
>> So it may be possible to modify distcc to avoid doing that in the pump
>> case.
>> The code which changes the object file name is in the dcc_run_job()
>> function in src/serve.c (look in particular for the calls to
>> dcc_set_output(), but other parts of the function would need modification
>> too).
>> But I guess if you're not going to be using pump mode, that wouldn't help
>> you.
>>
>> You may find that the object files are more deterministic if you don't
>> pass the "-g" flag to the compiler.
>>
>> Cheers,
>>    Fergus.
>>
>>
>>> A lot of the projects that I will be compiling include boost, and I
>>> believe that the pump fails on those, and falls back to regular mode.
>>>
>>> -Jeff
>>>
>>>
>>> On Tue, Jun 29, 2010 at 10:48 AM, Fergus Henderson <fergus at google.com>wrote:
>>>
>>>> Did you try using pump mode?
>>>> That should give you a better build speed-up and may also avoid this
>>>> issue.
>>>>
>>>> On Jun 29, 2010 6:32 AM, "Jeff Kilpatrick" <kilpatrick.jeff at gmail.com>
>>>> wrote:
>>>> > Oops, my original response went directly to Ihar, rather than to the
>>>> list.
>>>> >
>>>> > ----
>>>> >
>>>> >
>>>> >
>>>> > Thank you for your response.
>>>> >
>>>> > We do have a tool internally that could 'scrub' the object file of its
>>>> > dynamic symbols, and could be adapted for this purpose. However, I'm
>>>> > hesitant to modify anything with the .o and .so with an external tool,
>>>> as in
>>>> > some cases, it may be hiding a legitimate issue. Once an exception
>>>> makes it
>>>> > into the code, its tempting to continue adding exceptions to fix
>>>> issues.
>>>> > Before you know it, you have 600 branches with unique 'fixes' to them
>>>> :)
>>>> >
>>>> > Once we get a consistent checksum on the .o and .so files, they'll be
>>>> > packaged into a .iso, which will also need to be repeatable. This can
>>>> be
>>>> > challenging as well, since attributes on the files can affect the
>>>> final
>>>> > checksum.
>>>> >
>>>> > -Jeff
>>>> >
>>>> >
>>>> > On Tue, Jun 29, 2010 at 6:58 AM, Ihar `Philips` Filipau <
>>>> > thephilips at gmail.com> wrote:
>>>> >
>>>> >> Hi Jeff!
>>>> >>
>>>> >> You can try to collect the check-sum only for the ELF segments which
>>>> are
>>>> >> actually derived from the the source code, omitting the segments with
>>>> the
>>>> >> extra compiler's info. I do not know any ready tool for the purpose,
>>>> but
>>>> >> coding something like this - print on stdout all segments except the
>>>> >> black-listed - shouldn't be too complicated.
>>>> >>
>>>> >>
>>>> >> On Tue, Jun 29, 2010 at 11:41 AM, Jeff Kilpatrick <
>>>> >> kilpatrick.jeff at gmail.com> wrote:
>>>> >>
>>>> >>> Thank you for your response.
>>>> >>>
>>>> >>> Yes, this is the only difference in the object file. We've taken
>>>> great
>>>> >>> pains over the last few years, removing anything that would cause
>>>> checksums
>>>> >>> to mismatch.
>>>> >>>
>>>> >>> I will do some research myself, and talk to a few developers to see
>>>> if
>>>> >>> they can help me.
>>>> >>>
>>>> >>> Thanks
>>>> >>> -Jeff
>>>> >>>
>>>> >>>
>>>> >>> On Tue, Jun 29, 2010 at 1:32 AM, Martin Pool <mbp at sourcefrog.net>
>>>> wrote:
>>>> >>>
>>>> >>>> On 29 June 2010 13:02, Jeff Kilpatrick <kilpatrick.jeff at gmail.com>
>>>> >>>> wrote:
>>>> >>>> > Hello,
>>>> >>>> >
>>>> >>>> > At my work, we've just begun to investigate how much of an impact
>>>> that
>>>> >>>> > distcc will have on our builds.
>>>> >>>> >
>>>> >>>> > We typically perform 200 builds a week, ranging from a thousand
>>>> lines
>>>> >>>> of
>>>> >>>> > code, up to 600,000 lines of code each. Our back end build
>>>> scripts are
>>>> >>>> based
>>>> >>>> > on python, and use Linux make to build. We are running VMWare
>>>> images on
>>>> >>>> a
>>>> >>>> > blade cluster, and each of our three new build servers have 20Ghz
>>>> >>>> processing
>>>> >>>> > power, with 4G of RAM. Our primary build environments are loop
>>>> back
>>>> >>>> ISOs,
>>>> >>>> > from a central CIFS server, and are unioned together with
>>>> unionfs. Our
>>>> >>>> > source code is then copied into this environment, and we proceed
>>>> with
>>>> >>>> our
>>>> >>>> > build, using chroot to enter our build environment. Our 'distcc'
>>>> >>>> machines
>>>> >>>> > use the same loop back system, with only our OS and distcc being
>>>> >>>> accessible.
>>>> >>>>
>>>> >>>> That's pretty cool.
>>>> >>>>
>>>> >>>> > One of the most important things for our builds, due to the
>>>> market that
>>>> >>>> we
>>>> >>>> > are in, is that our builds must be reproducible, with repeatable
>>>> >>>> md5sums on
>>>> >>>> > our shared objects, based on the same label and same
>>>> dependencies. In
>>>> >>>> our
>>>> >>>> > recent tests, we were able to take a particular build from 24
>>>> minutes
>>>> >>>> to 14
>>>> >>>> > minutes, then finally 5 minutes, using distcc and adjusting our
>>>> VMs.
>>>> >>>> > However, when performing an md5sum on our final shared objects /
>>>> object
>>>> >>>> > files, the checksums change every build. We dropped down to just
>>>> using
>>>> >>>> g++
>>>> >>>> > to perform our linking, all locally, but our object files are
>>>> still
>>>> >>>> > mismatching.
>>>> >>>> >
>>>> >>>> > In the object files' `objdump -s` output, it appears that an
>>>> entry is
>>>> >>>> being
>>>> >>>> > made into all our object files with the following syntax
>>>> >>>> "distccd_XXXXX",
>>>> >>>> > with XXXXX being a seemingly random combination of characters.
>>>> >>>>
>>>> >>>> Hi Jeff,
>>>> >>>>
>>>> >>>> I think this is coming from gcc recording the input file name in
>>>> the
>>>> >>>> object file. distccd_xxxx.ii is the temporary file name used on the
>>>> >>>> server.
>>>> >>>>
>>>> >>>> > In the same object file, compiled locally without distcc, we get
>>>> a
>>>> >>>> rather
>>>> >>>> > generic <built-in> placeholder.
>>>> >>>>
>>>> >>>> I think this means it's coming from the builtin preprocessor.
>>>> >>>>
>>>> >>>> I probably won't have time to work on this myself but if you have a
>>>> >>>> programmer interested in it there are two possible avenues:
>>>> >>>>
>>>> >>>> - make gcc read from a file called <built-in> in a temporary
>>>> subdirectory
>>>> >>>>
>>>> >>>> - find some way to stop it recording the compiler input file name
>>>> >>>>
>>>> >>>> Is that the only difference in the object files? It's pretty common
>>>> >>>> for compilers to also record something about the time the
>>>> compilation
>>>> >>>> was run or for source files to build this in, which would mean they
>>>> >>>> change every time.
>>>> >>>>
>>>> >>>> >
>>>> >>>> > I've reviewed the source code for distcc, and seen a few
>>>> references to
>>>> >>>> this
>>>> >>>> > distccd_xxxxx. Unfortunately, I'm not a programmer, and thus am
>>>> at a
>>>> >>>> loss on
>>>> >>>> > how to further troubleshoot this, or even if its possible to get
>>>> >>>> consistent
>>>> >>>> > checksums with distcc.
>>>> >>>> >
>>>> >>>> >
>>>> >>>> > Versions
>>>> >>>> > =======
>>>> >>>> > g++ (Gentoo 4.3.2-r4 p1.8, pie-10.1.5) 4.3.2
>>>> >>>> >
>>>> >>>> > distcc 3.1 i686-pc-linux-gnu
>>>> >>>> > (protocols 1, 2 and 3) (default port 3632)
>>>> >>>> > built Mar 29 2010 10:55:35
>>>> >>>> >
>>>> >>>> > Kernel: 2.6.9-89.ELsmp
>>>> >>>> >
>>>> >>>> > Command being issued:
>>>> >>>> > DISTCC_VERBOSE=1 make -j24 CXX="distcc"
>>>> >>>> >
>>>> >>>> > Here's the partial output of objdump -s:
>>>> >>>> > 04f0 00030000 5f6d6f76 655f636f 6e737472 ...._move_constr
>>>> >>>> > 0500 7563745f 66776b2e 68000300 00474454 uct_fwk.h....GDT
>>>> >>>> > 0510 79706573 2e68000a 00007365 72646566 ypes.h....serdef
>>>> >>>> > 0520 732e6800 01000073 75666669 782e6870 s.h....suffix.hp
>>>> >>>> > 0530 70000b00 00646973 74636364 5f616333 p....distccd_ac3
>>>> >>>> > 0540 31633936 612e6969 000c0000 61646c5f 1c96a.ii....adl_
>>>> >>>> > 0550 62617272 6965722e 68707000 0d000062 barrier.hpp....b
>>>> >>>> > 0560 6f6f6c5f 6677642e 68707000 0e000069 ool_fwd.hpp....i
>>>> >>>> > 0570 6e746567 72616c5f 635f7461 672e6870 ntegral_c_tag.hp
>>>> >>>> > 0580 70000e00 00766f69 645f6677 642e6870 p....void_fwd.hp
>>>> >>>> >
>>>> >>>> > Thank you for reviewing my issue.
>>>> >>>> >
>>>> >>>> > -Jeff
>>>> >>>> >
>>>> >>>> > __
>>>> >>>> > distcc mailing list http://distcc.samba.org/
>>>> >>>> > To unsubscribe or change options:
>>>> >>>> > https://lists.samba.org/mailman/listinfo/distcc
>>>> >>>> >
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>> >>>> --
>>>> >>>> Martin
>>>> >>>>
>>>> >>>
>>>> >>>
>>>> >>> __
>>>> >>> distcc mailing list http://distcc.samba.org/
>>>> >>> To unsubscribe or change options:
>>>> >>> https://lists.samba.org/mailman/listinfo/distcc
>>>> >>>
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> Don't walk behind me, I may not lead.
>>>> >> Don't walk in front of me, I may not follow.
>>>> >> Just walk beside me and be my friend.
>>>> >> -- Albert Camus (attributed to)
>>>> >>
>>>>
>>>
>>>
>>
>>
>> --
>> Fergus Henderson <fergus at google.com>
>>
>
>


-- 
Fergus Henderson <fergus at google.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.samba.org/pipermail/distcc/attachments/20100629/5ce57ae4/attachment-0001.html>


More information about the distcc mailing list