Using submodules for third_party/

Andrew Bartlett abartlet at samba.org
Sun Dec 7 00:36:06 MST 2014


On Sat, 2014-12-06 at 23:57 +0000, Jelmer Vernooij wrote:
> At the moment we're manually bundling a bunch of third party libraries
> in third_party/. Rather than keeping (usually partial) copy of these
> libraries in our Git repository, I would propose using git
> submodules where possible (in other words, where the upstream is using
> git). Submodules have come a long way since they were originally
> introduced in Git.
> 
> Using submodules would have the following advantages. Mainly:
> 
> * it is easy to avoid bundled third party libraries by simply not
>   running 'git submodule init'.
> 
> * easy to review updates of upstream revisions we bundle; updating a
>   submodule shows up as a one-line diff. This means we can be sure
>   we're using an unmodified upstream revision; at the moment this is
>   hard to verify. You have to manually pull down a copy to verify that
>   the changes being made to the copy of a third party library are
>   the same as in the upstream repo.

I really, really would appreciate this.  I trust you totally, but I
still read over one of those large imports line by line, re-ran the
input script to double-check and asked you to publish it over an SSH
link.  While some of that will still be a good practice, I do really
think a git hash would be a far better, and safer reference.

> Some other nice benefits:
> 
> * we're sure we always ship the pristine upstream source; what the
>   system version would provide too
> 
> * easy to update, allows killing update-external.sh
> 
> * reduces unnecessary growth of our own git repo :)
> 
> There are two minor downsides I can think of:
> 
> * after checkout, it is necessary to run 'git submodule init' to
>   do the initial checkout of submodules and then run 'git submodule
>   update' whenever there are changes to the submodules. This
>   can be avoided by setting the 'fetch.recurseSubmodules' setting in
>   Git to 'yes'.
> 
> * if the upstream repository is down for some reason, you can't check
>   out the third party library. We could work around this by hosting
>   our own clone of third party libraries on git.samba.org.
> 
>   That said, I don't think such a workaround is necessary. In the rare
>   cases that the upstream repository is down, users can always install the
>   system version of an external library (since we would only use
>   submodules for third party libraries).
> 
>   This also only affects new checkouts and fetches of changes to the
>   submodules. If the submodule reference doesn't change, there is no
>   need for updates.

We certainly should be using this or signed tarballs - anything better
than a simply 'asserted' copy of upstream.  It also avoids the tinkering
in external projects problem, which we used to do much more than we do
now. 

To ensure I was informed about this topic, I followed all the links on
why 'you should not' use submodules and the suggested alternatives in
this article:
http://blogs.atlassian.com/2013/05/alternatives-to-git-submodule-git-subtree/

It seems that your suggestion avoids most of the issues, because we
really do want to use them for static pointers into released GIT repos,
not for ongoing development.  Indeed, we deliberately wish to impede
making direct changes to these directories. 

Finally, I really do look forward to when a lorikeet-heimdal update
becomes just a submodule update :-)

+1

Andrew Bartlett

-- 
Andrew Bartlett                       http://samba.org/~abartlet/
Authentication Developer, Samba Team  http://samba.org
Samba Developer, Catalyst IT          http://catalyst.net.nz/services/samba




More information about the samba-technical mailing list