[distcc] Fwd: Repeatable .o and .so checksums with distcc

Jeff Kilpatrick kilpatrick.jeff at gmail.com
Tue Jun 29 09:15:31 MDT 2010


Hey Fergus.

You are correct about the "another problem which may happen".  I applied the
fix you suggested, and set the temp_o and temp_i back to orig_output and
orig_input through the dcc_set_output() calls, and I am now getting
consistent checksums. I will be doing builds all through the afternoon to
confirm checksums match every single time.

Thank you all so very much. You have literally saved us thousands of hours
in compile time, per week.

-Jeff

My changes:

serve.c:

    if (cpp_where == DCC_CPP_ON_SERVER) {
        if (dcc_r_many_files(in_fd, temp_dir, compr)
//            || dcc_set_output(argv, temp_o)
            || dcc_set_output(argv, orig_output)
            || tweak_arguments_for_server(argv, temp_dir, deps_fname,
                                          &dotd_target, &tweaked_argv))
            goto out_cleanup;

        if ((ret = dcc_r_token_file(in_fd, "DOTI", temp_i, compr))
            || (ret = dcc_set_input(argv, orig_input))
            || (ret = dcc_set_output(argv, orig_output)))

//            || (ret = dcc_set_input(argv, temp_i))
//            || (ret = dcc_set_output(argv, temp_o)))
            goto out_cleanup;

On Tue, Jun 29, 2010 at 11:59 AM, Fergus Henderson <fergus at google.com>wrote:

> On Tue, Jun 29, 2010 at 9:52 AM, Jeff Kilpatrick <
> kilpatrick.jeff at gmail.com> wrote:
>
>> Yes, I have tried both pump and regular mode, and both behave the same
>> way.
>>
>
> Well, I don't think it is exactly the same way.  In the non-pump case,
> distcc does the preprocessing locally, sends the ".ii" file to the server,
> and the server then invokes gcc with the name of the ".ii" file, e.g.
> /tmp/distccd_ac31c96a.ii... that is what gcc ends up embedding in the object
> file.
> In the pump case, the source file names used on the server are the same as
> the source file names used on the client, so the problem in your original
> email won't happen in that case.
>
> But there is another problem which may happen in both cases:
> distcc changes the command line on the server to use a different object
> file name, e.g. "-o ./tmp/distccd_ac31c96a.o",
> and gcc may embed the name of the object file in the object file.
> In the non-pump case, this changing of the object file name is needed to
> ensure that two different distcc invocations on the same server don't try to
> write to the same file.
> But in the pump case, where the compilation is being invoked in a temporary
> directory, I don't think it is actually necessary to change the object file
> name...
> I think the code to do that has just been inherited for historical reasons
> from the non-pump case.
> So it may be possible to modify distcc to avoid doing that in the pump
> case.
> The code which changes the object file name is in the dcc_run_job()
> function in src/serve.c (look in particular for the calls to
> dcc_set_output(), but other parts of the function would need modification
> too).
> But I guess if you're not going to be using pump mode, that wouldn't help
> you.
>
> You may find that the object files are more deterministic if you don't pass
> the "-g" flag to the compiler.
>
> Cheers,
>    Fergus.
>
>
>> A lot of the projects that I will be compiling include boost, and I
>> believe that the pump fails on those, and falls back to regular mode.
>>
>> -Jeff
>>
>>
>> On Tue, Jun 29, 2010 at 10:48 AM, Fergus Henderson <fergus at google.com>wrote:
>>
>>> Did you try using pump mode?
>>> That should give you a better build speed-up and may also avoid this
>>> issue.
>>>
>>> On Jun 29, 2010 6:32 AM, "Jeff Kilpatrick" <kilpatrick.jeff at gmail.com>
>>> wrote:
>>> > Oops, my original response went directly to Ihar, rather than to the
>>> list.
>>> >
>>> > ----
>>> >
>>> >
>>> >
>>> > Thank you for your response.
>>> >
>>> > We do have a tool internally that could 'scrub' the object file of its
>>> > dynamic symbols, and could be adapted for this purpose. However, I'm
>>> > hesitant to modify anything with the .o and .so with an external tool,
>>> as in
>>> > some cases, it may be hiding a legitimate issue. Once an exception
>>> makes it
>>> > into the code, its tempting to continue adding exceptions to fix
>>> issues.
>>> > Before you know it, you have 600 branches with unique 'fixes' to them
>>> :)
>>> >
>>> > Once we get a consistent checksum on the .o and .so files, they'll be
>>> > packaged into a .iso, which will also need to be repeatable. This can
>>> be
>>> > challenging as well, since attributes on the files can affect the final
>>> > checksum.
>>> >
>>> > -Jeff
>>> >
>>> >
>>> > On Tue, Jun 29, 2010 at 6:58 AM, Ihar `Philips` Filipau <
>>> > thephilips at gmail.com> wrote:
>>> >
>>> >> Hi Jeff!
>>> >>
>>> >> You can try to collect the check-sum only for the ELF segments which
>>> are
>>> >> actually derived from the the source code, omitting the segments with
>>> the
>>> >> extra compiler's info. I do not know any ready tool for the purpose,
>>> but
>>> >> coding something like this - print on stdout all segments except the
>>> >> black-listed - shouldn't be too complicated.
>>> >>
>>> >>
>>> >> On Tue, Jun 29, 2010 at 11:41 AM, Jeff Kilpatrick <
>>> >> kilpatrick.jeff at gmail.com> wrote:
>>> >>
>>> >>> Thank you for your response.
>>> >>>
>>> >>> Yes, this is the only difference in the object file. We've taken
>>> great
>>> >>> pains over the last few years, removing anything that would cause
>>> checksums
>>> >>> to mismatch.
>>> >>>
>>> >>> I will do some research myself, and talk to a few developers to see
>>> if
>>> >>> they can help me.
>>> >>>
>>> >>> Thanks
>>> >>> -Jeff
>>> >>>
>>> >>>
>>> >>> On Tue, Jun 29, 2010 at 1:32 AM, Martin Pool <mbp at sourcefrog.net>
>>> wrote:
>>> >>>
>>> >>>> On 29 June 2010 13:02, Jeff Kilpatrick <kilpatrick.jeff at gmail.com>
>>> >>>> wrote:
>>> >>>> > Hello,
>>> >>>> >
>>> >>>> > At my work, we've just begun to investigate how much of an impact
>>> that
>>> >>>> > distcc will have on our builds.
>>> >>>> >
>>> >>>> > We typically perform 200 builds a week, ranging from a thousand
>>> lines
>>> >>>> of
>>> >>>> > code, up to 600,000 lines of code each. Our back end build scripts
>>> are
>>> >>>> based
>>> >>>> > on python, and use Linux make to build. We are running VMWare
>>> images on
>>> >>>> a
>>> >>>> > blade cluster, and each of our three new build servers have 20Ghz
>>> >>>> processing
>>> >>>> > power, with 4G of RAM. Our primary build environments are loop
>>> back
>>> >>>> ISOs,
>>> >>>> > from a central CIFS server, and are unioned together with unionfs.
>>> Our
>>> >>>> > source code is then copied into this environment, and we proceed
>>> with
>>> >>>> our
>>> >>>> > build, using chroot to enter our build environment. Our 'distcc'
>>> >>>> machines
>>> >>>> > use the same loop back system, with only our OS and distcc being
>>> >>>> accessible.
>>> >>>>
>>> >>>> That's pretty cool.
>>> >>>>
>>> >>>> > One of the most important things for our builds, due to the market
>>> that
>>> >>>> we
>>> >>>> > are in, is that our builds must be reproducible, with repeatable
>>> >>>> md5sums on
>>> >>>> > our shared objects, based on the same label and same dependencies.
>>> In
>>> >>>> our
>>> >>>> > recent tests, we were able to take a particular build from 24
>>> minutes
>>> >>>> to 14
>>> >>>> > minutes, then finally 5 minutes, using distcc and adjusting our
>>> VMs.
>>> >>>> > However, when performing an md5sum on our final shared objects /
>>> object
>>> >>>> > files, the checksums change every build. We dropped down to just
>>> using
>>> >>>> g++
>>> >>>> > to perform our linking, all locally, but our object files are
>>> still
>>> >>>> > mismatching.
>>> >>>> >
>>> >>>> > In the object files' `objdump -s` output, it appears that an entry
>>> is
>>> >>>> being
>>> >>>> > made into all our object files with the following syntax
>>> >>>> "distccd_XXXXX",
>>> >>>> > with XXXXX being a seemingly random combination of characters.
>>> >>>>
>>> >>>> Hi Jeff,
>>> >>>>
>>> >>>> I think this is coming from gcc recording the input file name in the
>>> >>>> object file. distccd_xxxx.ii is the temporary file name used on the
>>> >>>> server.
>>> >>>>
>>> >>>> > In the same object file, compiled locally without distcc, we get a
>>> >>>> rather
>>> >>>> > generic <built-in> placeholder.
>>> >>>>
>>> >>>> I think this means it's coming from the builtin preprocessor.
>>> >>>>
>>> >>>> I probably won't have time to work on this myself but if you have a
>>> >>>> programmer interested in it there are two possible avenues:
>>> >>>>
>>> >>>> - make gcc read from a file called <built-in> in a temporary
>>> subdirectory
>>> >>>>
>>> >>>> - find some way to stop it recording the compiler input file name
>>> >>>>
>>> >>>> Is that the only difference in the object files? It's pretty common
>>> >>>> for compilers to also record something about the time the
>>> compilation
>>> >>>> was run or for source files to build this in, which would mean they
>>> >>>> change every time.
>>> >>>>
>>> >>>> >
>>> >>>> > I've reviewed the source code for distcc, and seen a few
>>> references to
>>> >>>> this
>>> >>>> > distccd_xxxxx. Unfortunately, I'm not a programmer, and thus am at
>>> a
>>> >>>> loss on
>>> >>>> > how to further troubleshoot this, or even if its possible to get
>>> >>>> consistent
>>> >>>> > checksums with distcc.
>>> >>>> >
>>> >>>> >
>>> >>>> > Versions
>>> >>>> > =======
>>> >>>> > g++ (Gentoo 4.3.2-r4 p1.8, pie-10.1.5) 4.3.2
>>> >>>> >
>>> >>>> > distcc 3.1 i686-pc-linux-gnu
>>> >>>> > (protocols 1, 2 and 3) (default port 3632)
>>> >>>> > built Mar 29 2010 10:55:35
>>> >>>> >
>>> >>>> > Kernel: 2.6.9-89.ELsmp
>>> >>>> >
>>> >>>> > Command being issued:
>>> >>>> > DISTCC_VERBOSE=1 make -j24 CXX="distcc"
>>> >>>> >
>>> >>>> > Here's the partial output of objdump -s:
>>> >>>> > 04f0 00030000 5f6d6f76 655f636f 6e737472 ...._move_constr
>>> >>>> > 0500 7563745f 66776b2e 68000300 00474454 uct_fwk.h....GDT
>>> >>>> > 0510 79706573 2e68000a 00007365 72646566 ypes.h....serdef
>>> >>>> > 0520 732e6800 01000073 75666669 782e6870 s.h....suffix.hp
>>> >>>> > 0530 70000b00 00646973 74636364 5f616333 p....distccd_ac3
>>> >>>> > 0540 31633936 612e6969 000c0000 61646c5f 1c96a.ii....adl_
>>> >>>> > 0550 62617272 6965722e 68707000 0d000062 barrier.hpp....b
>>> >>>> > 0560 6f6f6c5f 6677642e 68707000 0e000069 ool_fwd.hpp....i
>>> >>>> > 0570 6e746567 72616c5f 635f7461 672e6870 ntegral_c_tag.hp
>>> >>>> > 0580 70000e00 00766f69 645f6677 642e6870 p....void_fwd.hp
>>> >>>> >
>>> >>>> > Thank you for reviewing my issue.
>>> >>>> >
>>> >>>> > -Jeff
>>> >>>> >
>>> >>>> > __
>>> >>>> > distcc mailing list http://distcc.samba.org/
>>> >>>> > To unsubscribe or change options:
>>> >>>> > https://lists.samba.org/mailman/listinfo/distcc
>>> >>>> >
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> --
>>> >>>> Martin
>>> >>>>
>>> >>>
>>> >>>
>>> >>> __
>>> >>> distcc mailing list http://distcc.samba.org/
>>> >>> To unsubscribe or change options:
>>> >>> https://lists.samba.org/mailman/listinfo/distcc
>>> >>>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Don't walk behind me, I may not lead.
>>> >> Don't walk in front of me, I may not follow.
>>> >> Just walk beside me and be my friend.
>>> >> -- Albert Camus (attributed to)
>>> >>
>>>
>>
>>
>
>
> --
> Fergus Henderson <fergus at google.com>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.samba.org/pipermail/distcc/attachments/20100629/a360321e/attachment-0001.html>


More information about the distcc mailing list