Using submodules for third_party/

Stefan (metze) Metzmacher metze at samba.org
Mon Dec 8 08:35:13 MST 2014


Am 08.12.2014 um 14:56 schrieb Simo:
> On Mon, 2014-12-08 at 10:01 +0100, Stefan (metze) Metzmacher wrote:
>> Hi Jelmer,
>>
>>> At the moment we're manually bundling a bunch of third party libraries
>>> in third_party/. Rather than keeping (usually partial) copy of these
>>> libraries in our Git repository, I would propose using git
>>> submodules where possible (in other words, where the upstream is using
>>> git). Submodules have come a long way since they were originally
>>> introduced in Git.
>>>
>>> Using submodules would have the following advantages. Mainly:
>>>
>>> * it is easy to avoid bundled third party libraries by simply not
>>>   running 'git submodule init'.
>>>
>>> * easy to review updates of upstream revisions we bundle; updating a
>>>   submodule shows up as a one-line diff. This means we can be sure
>>>   we're using an unmodified upstream revision; at the moment this is
>>>   hard to verify. You have to manually pull down a copy to verify that
>>>   the changes being made to the copy of a third party library are
>>>   the same as in the upstream repo.
>>>
>>> Some other nice benefits:
>>>
>>> * we're sure we always ship the pristine upstream source; what the
>>>   system version would provide too
>>>
>>> * easy to update, allows killing update-external.sh
>>>
>>> * reduces unnecessary growth of our own git repo :)
>>>
>>> There are two minor downsides I can think of:
>>>
>>> * after checkout, it is necessary to run 'git submodule init' to
>>>   do the initial checkout of submodules and then run 'git submodule
>>>   update' whenever there are changes to the submodules. This
>>>   can be avoided by setting the 'fetch.recurseSubmodules' setting in
>>>   Git to 'yes'.
>>>
>>> * if the upstream repository is down for some reason, you can't check
>>>   out the third party library. We could work around this by hosting
>>>   our own clone of third party libraries on git.samba.org.
>>>
>>>   That said, I don't think such a workaround is necessary. In the rare
>>>   cases that the upstream repository is down, users can always install the
>>>   system version of an external library (since we would only use
>>>   submodules for third party libraries).
>>
>> I can think of the following problem: an upstream project changes the url
>> to it's repo. Then you checkout an older samba version after a few year
>> in order
>> to track down a customer bug. And the old samba version still references the
>> old upstream url...
>>
>>>   This also only affects new checkouts and fetches of changes to the
>>>   submodules. If the submodule reference doesn't change, there is no
>>>   need for updates.
>>
>> My typical setup is the following:
>>
>> I have a bare repository on my laptop where I configured a lot of
>> remotes, a cronjob runs 'git remote update' every few minutes.
>>
>> Then I have working repositories/checkouts, which are configured like this:
>>
>> metze at SERNOX14:~/devel/samba/3.X/masterF$ cat .git/objects/info/alternates
>> /home/metze/devel/samba/samba-bare.git/objects
>> metze at SERNOX14:~/devel/samba/3.X/masterF$ ls -la .git/refs/
>> insgesamt 64
>> drwxrws--- 6 metze metze  4096 Nov 16  2013 .
>> drwxrws--- 9 metze metze  4096 Dez  4 10:21 ..
>> drwxrws--- 2 metze metze  4096 Apr 21  2009 bisect
>> drwxrws--- 2 metze metze  4096 Dez  4 10:21 heads
>> lrwxrwxrwx 1 metze metze    51 Jun  7  2011 remotes ->
>> /home/metze/devel/samba/samba-bare.git/refs/remotes
>> -rw-rw---- 1 metze metze    41 Nov 16  2013 stash
>> drwxrws--- 2 metze metze  4096 Apr 29  2010 stash.d
>> drwxrws--- 4 metze metze 40960 Jun 23 14:09 tags
>>
>> If I remember correctly this wasn't supported when using git submodules,
>> when we discussed this topic the last time. So I nacked the proposal.
>>
>> However I'm open to reevaluate, but everything needs to be available
>> offline after doing a 'git clone git://git.samba.org/samba.git' with
>> 'fetch.recurseSubmodules = yes' configured in ~/.gitconfig. And it needs
>> to support my workflow...
>>
>> I just tested this with your repository which has submodules
>> in the following branch:
>> https://git.samba.org/?p=jelmer/samba.git;a=shortlog;h=refs/heads/for-review/submodules
>>
>> metze at SERNOX14:/dev/shm$ git config fetch.recurseSubmodules
>> yes
>> metze at SERNOX14:/dev/shm$ git clone git://git.samba.org/jelmer/samba.git
>> Klone nach 'samba'...
>> remote: Counting objects: 986939, done.
>> remote: Compressing objects: 100% (230556/230556), done.
>> remote: Total 986939 (delta 752585), reused 979504 (delta 745183)
>> Empfange Objekte: 100% (986939/986939), 228.08 MiB | 1.78 MiB/s, done.
>> Löse Unterschiede auf: 100% (752585/752585), done.
>> Prüfe Konnektivität... Fertig.
>> metze at SERNOX14:/dev/shm$ cd samba/
>> git show 5a0c331b259407896e63267e578efafee879ed4f | grep 'Subproject commit'
>> +Subproject commit 43c14fd73b3b94211ff8bfad8f894b48cce4e577
>> metze at SERNOX14:/dev/shm/samba$ git show
>> 43c14fd73b3b94211ff8bfad8f894b48cce4e577
>> fatal: bad object 43c14fd73b3b94211ff8bfad8f894b48cce4e577
>>
>> As long as that doesn't work, it gets a NACK from me, sorry.
> 
> Would it be acceptable if you had a git-clone alias/script that did the
> right thing ?

'git remote update' / 'git fetch' would need to do that,
within a bare repository.
But I'm not sure how that could work, it would need to look
at every commit object and check for possible submodules.
And I'm not sure how the fetches objects will be referenced
so that git gc --prune won't remove them.

Implementing this within git itself would be possible,
but I'm not sure it's easy to do...

metze

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: OpenPGP digital signature
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20141208/51311e90/attachment.pgp>


More information about the samba-technical mailing list