[distcc] Parsing boost headers

Thu Feb 17 04:17:26 MST 2011

Thanks Martin and Fergus for the feedback.

I'm happy to report some more progress on improving the include_server. Before I was fixing bugs, but now I'm looking more at the performance, particularly when parsing boost headers. According to my tests, the parser now runs about 1000x faster when evaluating the include closure for a test file.

Unfortunately these improvements may not be enough to get a successful distributed compile with the boost headers. (See Bad News below)

In any case, I've pushed the changes up to my branch here: https://code.launchpad.net/~arankine/distcc/issue16

OK, so what did I do to improve the performance so much?

Firstly I implemented the optimisation mentioned in a previous email. This was to remove the unexpanded function-like macro from the set of possible expansions. Before, if you had a macro call "FOO()" for evaluation, then the unexpanded "FOO()" would be added to the list of possible expansions, before looking up the FOO macro definition and expanding from there. All I did was not add "FOO()" to the set of possible expansions, and instead immediately recurse into its definitions.

The next thing was to observe that a few key boost macros were being multiply-defined in order to support compiler bugs. Crucially, some of these were being used in calculating include file names, particularly in the boost preprocessor and MPL libraries. The fix here was a bit hacky, but it worked. I just defined a set of hard-coded "override" macros which always take precedence to those declared in the header files.

For example, boost defines BOOST_PP_CAT(a,b) in order to do the a##b operation in a portable manner across many compilers. However the multiple possible definitions (which you can see in the test file below) lead to an explosion in the number of expansions inside the macro evaluation code. The solution was to define an override macro for BOOST_PP_CAT (and a few others) and short-circuit many possible evaluation paths.

The second optimisation (override macros) also has the benefit that you can prevent many unneeded header files from being included in the closure. For example, there are directories of header files in boost/mpl/_aux which are compiler-specific and are selected at runtime based on which compiler is in use. By creating an override macro definition (BOOST_MPL_CFG_COMPILER_DIR) we can just point at the gcc-specific headers, and ignore the rest.

The results:

With neither optimisation, my test file was parsed in approx 110s.
With the first optimistation, the parsing time dropped to approx 6s.
With the first and second optimisations, the parsing time dropped to approx 0.06s.

Anyway here is the test file if you want to have a play: http://pastebin.com/hhQCQg7y (evalperf.cpp)
Here is associated test script, obviously you'll need to modify a few file paths: http://pastebin.com/AfeFWZyy (evalperf.py)

Now the Bad News.

Basically it seems that even this may not enough to parse the boost headers. It seems to parse some of them just fine, but others will probably still result in timeouts. This is due to the number and complexity of the boost headers.

I tested the include parser on the boost/spirit/include/qi.hpp header (a particularly complex case), and returned almost 1500 header files in the closure. This is likely to take some time, no matter what the parser!

There are a number of different next steps here, but I'm going to take a bit more time to examine the feasibility of these changes in a real production environment, before deciding how to proceed.