Submodule improvements

Stefan (metze) Metzmacher metze at samba.org
Wed Jun 24 08:29:05 MDT 2015


Hi Jelmer,

here's my summary why I think submodules would cause a lot of pain,
if we would use them.

Problem 1.) Working offline doesn't work as before

  --recurse-submodules=yes and git config --global
"fetch.recurseSubmodules" "yes"
  have no impact when fetching into a bare repositories or other
branches without a
  checkout. It only runs 'git submodule update --init --recursive' in
the local
  checkout.

  This means you can never be sure you have all objects you need in order
  to reconstruct the working tree/checkout of every commit in remote
repository
  you're tracking.

  The following would not work:

  - You have a long trip on the train
    and you want to do a git clone before and work offline
    on the train:

    ~/$ git clone --recursive git://git.samba.org/jelmer/samba.git jelmer
    Initialized empty Git repository in /home/metze/jelmer/.git/
    remote: Counting objects: 1212511, done.
    remote: Compressing objects: 100% (281243/281243), done.
    remote: Total 1212511 (delta 933991), reused 1197759 (delta 919360)
    Receiving objects: 100% (1212511/1212511), 339.29 MiB | 45.48 MiB/s,
done.
    Resolving deltas: 100% (933991/933991), done.

  - You are in the train and want to work (offline):

    ~/$ cd jelmer
    ~/jelmer/$ git branch -a
    * 3.2-util
      remotes/origin/3.2-util
      remotes/origin/HEAD -> origin/3.2-util
      remotes/origin/bsd-ctor
      remotes/origin/experimental
      remotes/origin/fix-ordering
      remotes/origin/master
      remotes/origin/move-python-extras
      remotes/origin/pep8-third-party
      remotes/origin/pristine-tar
      remotes/origin/pyldb-fixes
      remotes/origin/spelling
      remotes/origin/submodule-dnspython
      remotes/origin/submodules
      remotes/origin/unstable
      remotes/origin/upstream_4.2
      remotes/origin/v4-0-test
      remotes/origin/v4-1-test
      remotes/origin/v4-2-test
      remotes/origin/waitpid
    ~/jelmer/$ git checkout -b
    jelmer-submodules origin/submodules
    Branch jelmer-submodules set up to track remote branch submodules
from origin by
    rebasing.
    Switched to a new branch 'jelmer-submodules'
    git submodule update --init
    Cloning into 'third_party/dnspython'...
    fatal: unable to access
'https://github.com/rthalley/dnspython.git/': Could not
    resolve host: github.com
    Clone of 'https://github.com/rthalley/dnspython.git' into submodule path
    'third_party/dnspython' failed

  - Possible improvement:
    If submodules would be part of the refs/ space of the top level
repository,
    I guess Problem 1. might be solved. E.g. using something like
    refs/submodules/<submodulename>/* and have every object of the
submodules
    available in the top level repo. But that could still cause conflicts
    and problems if different branches reference submodules with the
same name.

Problem 2.) Working with submodules is not transparent

 'git reset --hard <commit>', 'git checkout .',
 inserting 'x make -j' after each commit in git rebase -i
 and a lot of other commands don't work as before.
 They no longer create a known working tree
 that matches 100% the content of current HEAD commit.

 Things like a .git/hooks/post-checkout are not triggered
 in all places to work arround the limitation

 Following the example from Problem 1.:

 - You're lucky and online again:

   ~/jelmer/$ git submodule update --init
   Cloning into 'third_party/dnspython'...
   remote: Counting objects: 3761, done.
   remote: Total 3761 (delta 0), reused 0 (delta 0), pack-reused 3761
   Receiving objects: 100% (3761/3761), 1.12 MiB | 926.00 KiB/s, done.
   Resolving deltas: 100% (2682/2682), done.
   Checking connectivity... done.
   Submodule path 'third_party/dnspython': checked out
'43c14fd73b3b94211ff8bfad8f894b48cce4e577'
   Cloning into 'third_party/zlib'...
   remote: Counting objects: 4194, done.
   remote: Total 4194 (delta 0), reused 0 (delta 0), pack-reused 4194
   Receiving objects: 100% (4194/4194), 3.49 MiB | 1.22 MiB/s, done.
   Resolving deltas: 100% (2453/2453), done.
   Checking connectivity... done.
   Submodule path 'third_party/zlib': checked out
'50893291621658f355bc5b4d450a8d06a563053d'

   ~/jelmer/$ git log --pretty=oneline -5
   5fbe62fa83cd91d283fc66003b01a501ab21514d Change zlib to a submodule.
   5a0c331b259407896e63267e578efafee879ed4f Move dnspython into a submodule.
   49e208c2b34616d4308e0a3fcf3029c72ee7059f smbd: Use read_data() in
notify_inotify
   b322ea2059604ed94aa2170e634a59cdb4561681 lib: Add a simple read_data
call without NTSTATUS
   78d1c04e1ac48711ad6aa3d08b33a51848c49303 lib: Make write_data take a
const void *

 - Now we want to change the submodule reference commit.

   ~/jelmer/$ cd third_party/zlib/
   ~/jelmer/third_party/zlib/$ git log --pretty=oneline -5
   50893291621658f355bc5b4d450a8d06a563053d zlib 1.2.8
   5b5da45640eb33d0d102c0ce5f6967ca3d727dd7 Fix mixed line endings in
contrib/vstudio.
   2dad5389af478ae7d09f4479650c1471dc13e0f5 Correct spelling error in
zlib.h.
   b4d802825ac374af31e02e03aeb39d040b20ef0b Clean up contrib/vstudio
[Ro<C3><9F>].
   f5ec26344f2c9d4facc000cd9c8495e279a0eba6 Update some copyright years.
   ~/jelmer/third_party/zlib/$ git reset --hard HEAD^
   HEAD is now at 5b5da45 Fix mixed line endings in contrib/vstudio.
   ~/jelmer/third_party/zlib/$ cd -
   ~/jelmer/
   ~/jelmer/$ git status
   On branch jelmer-submodules
   Your branch is up-to-date with 'origin/submodules'.

   Changes not staged for commit:
     (use "git add <file>..." to update what will be committed)
     (use "git checkout -- <file>..." to discard changes in working
directory)

           modified:   third_party/zlib (new commits)

   no changes added to commit (use "git add" and/or "git commit -a")
   ~/jelmer/$ git commit -a -m "change zlib commit"
   [jelmer-submodules 8d6a9dc] change zlib commit
    1 file changed, 1 insertion(+), 1 deletion(-)

 - Now we want to find which commit introduced problem:

    # normally I'd do something like
    # EDITOR=true git rebase -i --exec 'make -j' HEAD^^
    # but in order to demostrate the problem I use 'git status'
    # as command
    ~/jelmer/$ EDITOR=true git rebase -i --exec 'git status' HEAD^^
    Executing: git status
    rebase in progress; onto 5a0c331
    You are currently editing a commit while rebasing branch
'jelmer-submodules' on '5a0c331'.
      (use "git commit --amend" to amend the current commit)
      (use "git rebase --continue" once you are satisfied with your changes)

    Changes not staged for commit:
      (use "git add <file>..." to update what will be committed)
      (use "git checkout -- <file>..." to discard changes in working
directory)

            modified:   third_party/zlib (new commits)

    no changes added to commit (use "git add" and/or "git commit -a")
    Executing: git status
    rebase in progress; onto 5a0c331
    You are currently editing a commit while rebasing branch
'jelmer-submodules' on '5a0c331'.
      (use "git commit --amend" to amend the current commit)
      (use "git rebase --continue" once you are satisfied with your changes)

    nothing to commit, working directory clean
    Successfully rebased and updated refs/heads/jelmer-submodules.

  - This means that the submodules are not changed in rebase steps.
  - Lets check if 'git reset --hard HEAD^' can revert

    ~/jelmer/$ git reset --hard HEAD^
    HEAD is now at 5fbe62f Change zlib to a submodule.
    ~/jelmer/$ echo $?
    0
    ~/jelmer/$ git status
    On branch jelmer-submodules
    Your branch is up-to-date with 'origin/submodules'.

    Changes not staged for commit:
      (use "git add <file>..." to update what will be committed)
      (use "git checkout -- <file>..." to discard changes in working
directory)

            modified:   third_party/zlib (new commits)

    no changes added to commit (use "git add" and/or "git commit -a")

  - git reset --hard HEAD^ doesn't report any error, while it doesn't
    reset everything

Problem 3.) Problem 1. + 2. both destroy a workflow using a bare proxy
repository.

  - My setup looks like this (some others use a similar setup).

    ~/.gitconfig has this:

    [core]
            logallrefupdates = true
    [gc]
            auto = 0
            autopacklimit = 0
            packrefs = false
            autopackrefs = 0
            # NOTE: never is available since git 1.7.0.3
            #       older versions need '9000 days'
            #
            #       Usage of just an int value '9000'
            #       will be mapped to 'now' and 'git gc'
            #       will delete all your reflogs!!!!
            #
            reflogexpire = never
            reflogexpireunreachable = never
            # NOTE: here '9000 days' doesn't work here
            #
            # fatal: bad config value for 'gc.rerereresolved' in
            # /home/metze/.gitconfig
            # error: failed to run rerere
            #
            rerereresolved = 9000
            rerereunresolved = 9000

    ~/samba/bare.git is a repository created with 'git --bare init'
    - git remote add origin git://git.samba.org/samba.git
    - For each developer I run 'git remote add developer-wip
git://git.samba.org/developer/samba.git'
      in order to track remote branches.
    - A cron job runs every 5 mins and runs 'git remote update', which
means I have
      all commits of all developers available in that bare repo.

    ~/samba/templare.git is a repo generated with:
    - git init
    - ln -s ~/samba/bare.git/refs/remotes .git/refs/remotes
    - echo "~/samba/bare.git/objects" > ..git/objects/info/alternates
    - git config "remote.local.url" "~/samba/bare.git"
    - git config "remote.local.fetch"
"+refs/remote/origin/*:refs/remote/origin/*"
    - git config --add "remote.local.fetch" "+refs/tags/*:refs/tags/*"

    I have about 10 working repositories created like this:
    cp -a ~/samba/templare.git ~/samba/v4-2-test
    cd ~/samba/bare.git
    git remote add local-v4-2-test ~/samba/v4-2-test
    cd ~/samba/v4-2-test

    That means that the following works in all working repositories
    completely offline! (The cron job fetches only local repositories
    while being offline).

    git branch -a # shows all branches of all branches tracked in the
bare repo
    git show <commit> # works for any commit ever fetched into the bare repo
    git reset --hard <commit> # restores the full working directory for
any commit

Summary:
- I think these problems are way to much trouble compared to what they
can solve.

- What is the goal we want to solve? I think commits to third_party/
  subdirectories should be rejected.

- I think we should have an improved scripts to import third_party code.
  - One script would calculate a checksum of all context and directory
    structure in third_party/project, which will be stored in
    third_party/project.checksums
  - Instead of just overwriting the target directory with rsync
    The import script should also run the script to generate
    third_party/project.checksums. And it should create a commit
    for the import, which includes the repostory url and the used
    commit. It also verifies that the working tree was clean before the
commit
  - The autobuild script would also generate third_party/project.checksums
    and the clean-source-tree.sh script whould reject a motification
    which didn't use the import script
  - I have started with this, but it's work in progress

https://git.samba.org/?p=metze/samba/wip.git;a=shortlog;h=refs/heads/master3-submodule-problems

Sorry, but I'm really against using submodules in Samba.

metze

Am 24.05.2015 um 17:48 schrieb Jelmer Vernooij:
> On Thu, May 21, 2015 at 02:45:48PM +0200, Andrew Bartlett wrote:
>> On Mon, 2015-05-18 at 20:23 +0000, Jelmer Vernooij wrote:
>>> This patchset improves our handling of submodules in our build system,
>>> by preventing developers build with outdated submodules if there are
>>> any present. This should hopefully address the concerns that Andrew
>>> raised in the last thread.
>>>
>>> While I was there, I also removed support for running Samba from a
>>> bzr repository rather than Git. I'm pretty sure I was the only who
>>> ever used that.
>>
>> Thanks Jelmer,
>>
>> I've pushed those with my review and then looked with metze to see what
>> we could do to make this work.  He can speak for himself, but when
>> demonstrating his concerns to me the major issue that he hit was the
>> boundary conditions when moving back and forth (such as in a rebase)
>> over the boundary where a submodule is put into use, with untracked
>> files appearing.  
> That should be a temporary issue while we migrate; once we have it's
> only a minor nuisance (you have to rm -rf third_party/dnspython before
> switching to an old branch).
> 
> As a workaround, we could add the submodule in a different path
> than where the original code used to live so they don't conflict. E.g.
> bundled/dnspython rather than third_party/dnspython.
> 
>> We did however show that the issues around initialisation of the
>> submoudle git modules could be handled fairly well by the fact that
>> submodule objects live in the .git directory, or a helper script for the
>> original git clone/fetch.
> What specifically doesn't work there that isn't handled by
> --recursive?
> 
> Cheers,
> 
> Jelmer
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: OpenPGP digital signature
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20150624/1a0fc957/attachment.pgp>


More information about the samba-technical mailing list