[distcc] Fwd: Repeatable .o and .so checksums with distcc

Fergus Henderson fergus at google.com
Tue Jun 29 08:59:17 MDT 2010


On Tue, Jun 29, 2010 at 9:52 AM, Jeff Kilpatrick
<kilpatrick.jeff at gmail.com>wrote:

> Yes, I have tried both pump and regular mode, and both behave the same way.
>

Well, I don't think it is exactly the same way.  In the non-pump case,
distcc does the preprocessing locally, sends the ".ii" file to the server,
and the server then invokes gcc with the name of the ".ii" file, e.g.
/tmp/distccd_ac31c96a.ii... that is what gcc ends up embedding in the object
file.
In the pump case, the source file names used on the server are the same as
the source file names used on the client, so the problem in your original
email won't happen in that case.

But there is another problem which may happen in both cases:
distcc changes the command line on the server to use a different object file
name, e.g. "-o ./tmp/distccd_ac31c96a.o",
and gcc may embed the name of the object file in the object file.
In the non-pump case, this changing of the object file name is needed to
ensure that two different distcc invocations on the same server don't try to
write to the same file.
But in the pump case, where the compilation is being invoked in a temporary
directory, I don't think it is actually necessary to change the object file
name...
I think the code to do that has just been inherited for historical reasons
from the non-pump case.
So it may be possible to modify distcc to avoid doing that in the pump case.
The code which changes the object file name is in the dcc_run_job() function
in src/serve.c (look in particular for the calls to dcc_set_output(), but
other parts of the function would need modification too).
But I guess if you're not going to be using pump mode, that wouldn't help
you.

You may find that the object files are more deterministic if you don't pass
the "-g" flag to the compiler.

Cheers,
   Fergus.


> A lot of the projects that I will be compiling include boost, and I believe
> that the pump fails on those, and falls back to regular mode.
>
> -Jeff
>
>
> On Tue, Jun 29, 2010 at 10:48 AM, Fergus Henderson <fergus at google.com>wrote:
>
>> Did you try using pump mode?
>> That should give you a better build speed-up and may also avoid this
>> issue.
>>
>> On Jun 29, 2010 6:32 AM, "Jeff Kilpatrick" <kilpatrick.jeff at gmail.com>
>> wrote:
>> > Oops, my original response went directly to Ihar, rather than to the
>> list.
>> >
>> > ----
>> >
>> >
>> >
>> > Thank you for your response.
>> >
>> > We do have a tool internally that could 'scrub' the object file of its
>> > dynamic symbols, and could be adapted for this purpose. However, I'm
>> > hesitant to modify anything with the .o and .so with an external tool,
>> as in
>> > some cases, it may be hiding a legitimate issue. Once an exception makes
>> it
>> > into the code, its tempting to continue adding exceptions to fix issues.
>> > Before you know it, you have 600 branches with unique 'fixes' to them :)
>> >
>> > Once we get a consistent checksum on the .o and .so files, they'll be
>> > packaged into a .iso, which will also need to be repeatable. This can be
>> > challenging as well, since attributes on the files can affect the final
>> > checksum.
>> >
>> > -Jeff
>> >
>> >
>> > On Tue, Jun 29, 2010 at 6:58 AM, Ihar `Philips` Filipau <
>> > thephilips at gmail.com> wrote:
>> >
>> >> Hi Jeff!
>> >>
>> >> You can try to collect the check-sum only for the ELF segments which
>> are
>> >> actually derived from the the source code, omitting the segments with
>> the
>> >> extra compiler's info. I do not know any ready tool for the purpose,
>> but
>> >> coding something like this - print on stdout all segments except the
>> >> black-listed - shouldn't be too complicated.
>> >>
>> >>
>> >> On Tue, Jun 29, 2010 at 11:41 AM, Jeff Kilpatrick <
>> >> kilpatrick.jeff at gmail.com> wrote:
>> >>
>> >>> Thank you for your response.
>> >>>
>> >>> Yes, this is the only difference in the object file. We've taken great
>> >>> pains over the last few years, removing anything that would cause
>> checksums
>> >>> to mismatch.
>> >>>
>> >>> I will do some research myself, and talk to a few developers to see if
>> >>> they can help me.
>> >>>
>> >>> Thanks
>> >>> -Jeff
>> >>>
>> >>>
>> >>> On Tue, Jun 29, 2010 at 1:32 AM, Martin Pool <mbp at sourcefrog.net>
>> wrote:
>> >>>
>> >>>> On 29 June 2010 13:02, Jeff Kilpatrick <kilpatrick.jeff at gmail.com>
>> >>>> wrote:
>> >>>> > Hello,
>> >>>> >
>> >>>> > At my work, we've just begun to investigate how much of an impact
>> that
>> >>>> > distcc will have on our builds.
>> >>>> >
>> >>>> > We typically perform 200 builds a week, ranging from a thousand
>> lines
>> >>>> of
>> >>>> > code, up to 600,000 lines of code each. Our back end build scripts
>> are
>> >>>> based
>> >>>> > on python, and use Linux make to build. We are running VMWare
>> images on
>> >>>> a
>> >>>> > blade cluster, and each of our three new build servers have 20Ghz
>> >>>> processing
>> >>>> > power, with 4G of RAM. Our primary build environments are loop back
>> >>>> ISOs,
>> >>>> > from a central CIFS server, and are unioned together with unionfs.
>> Our
>> >>>> > source code is then copied into this environment, and we proceed
>> with
>> >>>> our
>> >>>> > build, using chroot to enter our build environment. Our 'distcc'
>> >>>> machines
>> >>>> > use the same loop back system, with only our OS and distcc being
>> >>>> accessible.
>> >>>>
>> >>>> That's pretty cool.
>> >>>>
>> >>>> > One of the most important things for our builds, due to the market
>> that
>> >>>> we
>> >>>> > are in, is that our builds must be reproducible, with repeatable
>> >>>> md5sums on
>> >>>> > our shared objects, based on the same label and same dependencies.
>> In
>> >>>> our
>> >>>> > recent tests, we were able to take a particular build from 24
>> minutes
>> >>>> to 14
>> >>>> > minutes, then finally 5 minutes, using distcc and adjusting our
>> VMs.
>> >>>> > However, when performing an md5sum on our final shared objects /
>> object
>> >>>> > files, the checksums change every build. We dropped down to just
>> using
>> >>>> g++
>> >>>> > to perform our linking, all locally, but our object files are still
>> >>>> > mismatching.
>> >>>> >
>> >>>> > In the object files' `objdump -s` output, it appears that an entry
>> is
>> >>>> being
>> >>>> > made into all our object files with the following syntax
>> >>>> "distccd_XXXXX",
>> >>>> > with XXXXX being a seemingly random combination of characters.
>> >>>>
>> >>>> Hi Jeff,
>> >>>>
>> >>>> I think this is coming from gcc recording the input file name in the
>> >>>> object file. distccd_xxxx.ii is the temporary file name used on the
>> >>>> server.
>> >>>>
>> >>>> > In the same object file, compiled locally without distcc, we get a
>> >>>> rather
>> >>>> > generic <built-in> placeholder.
>> >>>>
>> >>>> I think this means it's coming from the builtin preprocessor.
>> >>>>
>> >>>> I probably won't have time to work on this myself but if you have a
>> >>>> programmer interested in it there are two possible avenues:
>> >>>>
>> >>>> - make gcc read from a file called <built-in> in a temporary
>> subdirectory
>> >>>>
>> >>>> - find some way to stop it recording the compiler input file name
>> >>>>
>> >>>> Is that the only difference in the object files? It's pretty common
>> >>>> for compilers to also record something about the time the compilation
>> >>>> was run or for source files to build this in, which would mean they
>> >>>> change every time.
>> >>>>
>> >>>> >
>> >>>> > I've reviewed the source code for distcc, and seen a few references
>> to
>> >>>> this
>> >>>> > distccd_xxxxx. Unfortunately, I'm not a programmer, and thus am at
>> a
>> >>>> loss on
>> >>>> > how to further troubleshoot this, or even if its possible to get
>> >>>> consistent
>> >>>> > checksums with distcc.
>> >>>> >
>> >>>> >
>> >>>> > Versions
>> >>>> > =======
>> >>>> > g++ (Gentoo 4.3.2-r4 p1.8, pie-10.1.5) 4.3.2
>> >>>> >
>> >>>> > distcc 3.1 i686-pc-linux-gnu
>> >>>> > (protocols 1, 2 and 3) (default port 3632)
>> >>>> > built Mar 29 2010 10:55:35
>> >>>> >
>> >>>> > Kernel: 2.6.9-89.ELsmp
>> >>>> >
>> >>>> > Command being issued:
>> >>>> > DISTCC_VERBOSE=1 make -j24 CXX="distcc"
>> >>>> >
>> >>>> > Here's the partial output of objdump -s:
>> >>>> > 04f0 00030000 5f6d6f76 655f636f 6e737472 ...._move_constr
>> >>>> > 0500 7563745f 66776b2e 68000300 00474454 uct_fwk.h....GDT
>> >>>> > 0510 79706573 2e68000a 00007365 72646566 ypes.h....serdef
>> >>>> > 0520 732e6800 01000073 75666669 782e6870 s.h....suffix.hp
>> >>>> > 0530 70000b00 00646973 74636364 5f616333 p....distccd_ac3
>> >>>> > 0540 31633936 612e6969 000c0000 61646c5f 1c96a.ii....adl_
>> >>>> > 0550 62617272 6965722e 68707000 0d000062 barrier.hpp....b
>> >>>> > 0560 6f6f6c5f 6677642e 68707000 0e000069 ool_fwd.hpp....i
>> >>>> > 0570 6e746567 72616c5f 635f7461 672e6870 ntegral_c_tag.hp
>> >>>> > 0580 70000e00 00766f69 645f6677 642e6870 p....void_fwd.hp
>> >>>> >
>> >>>> > Thank you for reviewing my issue.
>> >>>> >
>> >>>> > -Jeff
>> >>>> >
>> >>>> > __
>> >>>> > distcc mailing list http://distcc.samba.org/
>> >>>> > To unsubscribe or change options:
>> >>>> > https://lists.samba.org/mailman/listinfo/distcc
>> >>>> >
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Martin
>> >>>>
>> >>>
>> >>>
>> >>> __
>> >>> distcc mailing list http://distcc.samba.org/
>> >>> To unsubscribe or change options:
>> >>> https://lists.samba.org/mailman/listinfo/distcc
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Don't walk behind me, I may not lead.
>> >> Don't walk in front of me, I may not follow.
>> >> Just walk beside me and be my friend.
>> >> -- Albert Camus (attributed to)
>> >>
>>
>
>


-- 
Fergus Henderson <fergus at google.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.samba.org/pipermail/distcc/attachments/20100629/b0c26bb1/attachment-0001.html>


More information about the distcc mailing list