Submodule improvements
Stefan (metze) Metzmacher
metze at samba.org
Wed Jun 24 08:29:05 MDT 2015
Hi Jelmer,
here's my summary why I think submodules would cause a lot of pain,
if we would use them.
Problem 1.) Working offline doesn't work as before
--recurse-submodules=yes and git config --global
"fetch.recurseSubmodules" "yes"
have no impact when fetching into a bare repositories or other
branches without a
checkout. It only runs 'git submodule update --init --recursive' in
the local
checkout.
This means you can never be sure you have all objects you need in order
to reconstruct the working tree/checkout of every commit in remote
repository
you're tracking.
The following would not work:
- You have a long trip on the train
and you want to do a git clone before and work offline
on the train:
~/$ git clone --recursive git://git.samba.org/jelmer/samba.git jelmer
Initialized empty Git repository in /home/metze/jelmer/.git/
remote: Counting objects: 1212511, done.
remote: Compressing objects: 100% (281243/281243), done.
remote: Total 1212511 (delta 933991), reused 1197759 (delta 919360)
Receiving objects: 100% (1212511/1212511), 339.29 MiB | 45.48 MiB/s,
done.
Resolving deltas: 100% (933991/933991), done.
- You are in the train and want to work (offline):
~/$ cd jelmer
~/jelmer/$ git branch -a
* 3.2-util
remotes/origin/3.2-util
remotes/origin/HEAD -> origin/3.2-util
remotes/origin/bsd-ctor
remotes/origin/experimental
remotes/origin/fix-ordering
remotes/origin/master
remotes/origin/move-python-extras
remotes/origin/pep8-third-party
remotes/origin/pristine-tar
remotes/origin/pyldb-fixes
remotes/origin/spelling
remotes/origin/submodule-dnspython
remotes/origin/submodules
remotes/origin/unstable
remotes/origin/upstream_4.2
remotes/origin/v4-0-test
remotes/origin/v4-1-test
remotes/origin/v4-2-test
remotes/origin/waitpid
~/jelmer/$ git checkout -b
jelmer-submodules origin/submodules
Branch jelmer-submodules set up to track remote branch submodules
from origin by
rebasing.
Switched to a new branch 'jelmer-submodules'
git submodule update --init
Cloning into 'third_party/dnspython'...
fatal: unable to access
'https://github.com/rthalley/dnspython.git/': Could not
resolve host: github.com
Clone of 'https://github.com/rthalley/dnspython.git' into submodule path
'third_party/dnspython' failed
- Possible improvement:
If submodules would be part of the refs/ space of the top level
repository,
I guess Problem 1. might be solved. E.g. using something like
refs/submodules/<submodulename>/* and have every object of the
submodules
available in the top level repo. But that could still cause conflicts
and problems if different branches reference submodules with the
same name.
Problem 2.) Working with submodules is not transparent
'git reset --hard <commit>', 'git checkout .',
inserting 'x make -j' after each commit in git rebase -i
and a lot of other commands don't work as before.
They no longer create a known working tree
that matches 100% the content of current HEAD commit.
Things like a .git/hooks/post-checkout are not triggered
in all places to work arround the limitation
Following the example from Problem 1.:
- You're lucky and online again:
~/jelmer/$ git submodule update --init
Cloning into 'third_party/dnspython'...
remote: Counting objects: 3761, done.
remote: Total 3761 (delta 0), reused 0 (delta 0), pack-reused 3761
Receiving objects: 100% (3761/3761), 1.12 MiB | 926.00 KiB/s, done.
Resolving deltas: 100% (2682/2682), done.
Checking connectivity... done.
Submodule path 'third_party/dnspython': checked out
'43c14fd73b3b94211ff8bfad8f894b48cce4e577'
Cloning into 'third_party/zlib'...
remote: Counting objects: 4194, done.
remote: Total 4194 (delta 0), reused 0 (delta 0), pack-reused 4194
Receiving objects: 100% (4194/4194), 3.49 MiB | 1.22 MiB/s, done.
Resolving deltas: 100% (2453/2453), done.
Checking connectivity... done.
Submodule path 'third_party/zlib': checked out
'50893291621658f355bc5b4d450a8d06a563053d'
~/jelmer/$ git log --pretty=oneline -5
5fbe62fa83cd91d283fc66003b01a501ab21514d Change zlib to a submodule.
5a0c331b259407896e63267e578efafee879ed4f Move dnspython into a submodule.
49e208c2b34616d4308e0a3fcf3029c72ee7059f smbd: Use read_data() in
notify_inotify
b322ea2059604ed94aa2170e634a59cdb4561681 lib: Add a simple read_data
call without NTSTATUS
78d1c04e1ac48711ad6aa3d08b33a51848c49303 lib: Make write_data take a
const void *
- Now we want to change the submodule reference commit.
~/jelmer/$ cd third_party/zlib/
~/jelmer/third_party/zlib/$ git log --pretty=oneline -5
50893291621658f355bc5b4d450a8d06a563053d zlib 1.2.8
5b5da45640eb33d0d102c0ce5f6967ca3d727dd7 Fix mixed line endings in
contrib/vstudio.
2dad5389af478ae7d09f4479650c1471dc13e0f5 Correct spelling error in
zlib.h.
b4d802825ac374af31e02e03aeb39d040b20ef0b Clean up contrib/vstudio
[Ro<C3><9F>].
f5ec26344f2c9d4facc000cd9c8495e279a0eba6 Update some copyright years.
~/jelmer/third_party/zlib/$ git reset --hard HEAD^
HEAD is now at 5b5da45 Fix mixed line endings in contrib/vstudio.
~/jelmer/third_party/zlib/$ cd -
~/jelmer/
~/jelmer/$ git status
On branch jelmer-submodules
Your branch is up-to-date with 'origin/submodules'.
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working
directory)
modified: third_party/zlib (new commits)
no changes added to commit (use "git add" and/or "git commit -a")
~/jelmer/$ git commit -a -m "change zlib commit"
[jelmer-submodules 8d6a9dc] change zlib commit
1 file changed, 1 insertion(+), 1 deletion(-)
- Now we want to find which commit introduced problem:
# normally I'd do something like
# EDITOR=true git rebase -i --exec 'make -j' HEAD^^
# but in order to demostrate the problem I use 'git status'
# as command
~/jelmer/$ EDITOR=true git rebase -i --exec 'git status' HEAD^^
Executing: git status
rebase in progress; onto 5a0c331
You are currently editing a commit while rebasing branch
'jelmer-submodules' on '5a0c331'.
(use "git commit --amend" to amend the current commit)
(use "git rebase --continue" once you are satisfied with your changes)
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working
directory)
modified: third_party/zlib (new commits)
no changes added to commit (use "git add" and/or "git commit -a")
Executing: git status
rebase in progress; onto 5a0c331
You are currently editing a commit while rebasing branch
'jelmer-submodules' on '5a0c331'.
(use "git commit --amend" to amend the current commit)
(use "git rebase --continue" once you are satisfied with your changes)
nothing to commit, working directory clean
Successfully rebased and updated refs/heads/jelmer-submodules.
- This means that the submodules are not changed in rebase steps.
- Lets check if 'git reset --hard HEAD^' can revert
~/jelmer/$ git reset --hard HEAD^
HEAD is now at 5fbe62f Change zlib to a submodule.
~/jelmer/$ echo $?
0
~/jelmer/$ git status
On branch jelmer-submodules
Your branch is up-to-date with 'origin/submodules'.
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working
directory)
modified: third_party/zlib (new commits)
no changes added to commit (use "git add" and/or "git commit -a")
- git reset --hard HEAD^ doesn't report any error, while it doesn't
reset everything
Problem 3.) Problem 1. + 2. both destroy a workflow using a bare proxy
repository.
- My setup looks like this (some others use a similar setup).
~/.gitconfig has this:
[core]
logallrefupdates = true
[gc]
auto = 0
autopacklimit = 0
packrefs = false
autopackrefs = 0
# NOTE: never is available since git 1.7.0.3
# older versions need '9000 days'
#
# Usage of just an int value '9000'
# will be mapped to 'now' and 'git gc'
# will delete all your reflogs!!!!
#
reflogexpire = never
reflogexpireunreachable = never
# NOTE: here '9000 days' doesn't work here
#
# fatal: bad config value for 'gc.rerereresolved' in
# /home/metze/.gitconfig
# error: failed to run rerere
#
rerereresolved = 9000
rerereunresolved = 9000
~/samba/bare.git is a repository created with 'git --bare init'
- git remote add origin git://git.samba.org/samba.git
- For each developer I run 'git remote add developer-wip
git://git.samba.org/developer/samba.git'
in order to track remote branches.
- A cron job runs every 5 mins and runs 'git remote update', which
means I have
all commits of all developers available in that bare repo.
~/samba/templare.git is a repo generated with:
- git init
- ln -s ~/samba/bare.git/refs/remotes .git/refs/remotes
- echo "~/samba/bare.git/objects" > ..git/objects/info/alternates
- git config "remote.local.url" "~/samba/bare.git"
- git config "remote.local.fetch"
"+refs/remote/origin/*:refs/remote/origin/*"
- git config --add "remote.local.fetch" "+refs/tags/*:refs/tags/*"
I have about 10 working repositories created like this:
cp -a ~/samba/templare.git ~/samba/v4-2-test
cd ~/samba/bare.git
git remote add local-v4-2-test ~/samba/v4-2-test
cd ~/samba/v4-2-test
That means that the following works in all working repositories
completely offline! (The cron job fetches only local repositories
while being offline).
git branch -a # shows all branches of all branches tracked in the
bare repo
git show <commit> # works for any commit ever fetched into the bare repo
git reset --hard <commit> # restores the full working directory for
any commit
Summary:
- I think these problems are way to much trouble compared to what they
can solve.
- What is the goal we want to solve? I think commits to third_party/
subdirectories should be rejected.
- I think we should have an improved scripts to import third_party code.
- One script would calculate a checksum of all context and directory
structure in third_party/project, which will be stored in
third_party/project.checksums
- Instead of just overwriting the target directory with rsync
The import script should also run the script to generate
third_party/project.checksums. And it should create a commit
for the import, which includes the repostory url and the used
commit. It also verifies that the working tree was clean before the
commit
- The autobuild script would also generate third_party/project.checksums
and the clean-source-tree.sh script whould reject a motification
which didn't use the import script
- I have started with this, but it's work in progress
https://git.samba.org/?p=metze/samba/wip.git;a=shortlog;h=refs/heads/master3-submodule-problems
Sorry, but I'm really against using submodules in Samba.
metze
Am 24.05.2015 um 17:48 schrieb Jelmer Vernooij:
> On Thu, May 21, 2015 at 02:45:48PM +0200, Andrew Bartlett wrote:
>> On Mon, 2015-05-18 at 20:23 +0000, Jelmer Vernooij wrote:
>>> This patchset improves our handling of submodules in our build system,
>>> by preventing developers build with outdated submodules if there are
>>> any present. This should hopefully address the concerns that Andrew
>>> raised in the last thread.
>>>
>>> While I was there, I also removed support for running Samba from a
>>> bzr repository rather than Git. I'm pretty sure I was the only who
>>> ever used that.
>>
>> Thanks Jelmer,
>>
>> I've pushed those with my review and then looked with metze to see what
>> we could do to make this work. He can speak for himself, but when
>> demonstrating his concerns to me the major issue that he hit was the
>> boundary conditions when moving back and forth (such as in a rebase)
>> over the boundary where a submodule is put into use, with untracked
>> files appearing.
> That should be a temporary issue while we migrate; once we have it's
> only a minor nuisance (you have to rm -rf third_party/dnspython before
> switching to an old branch).
>
> As a workaround, we could add the submodule in a different path
> than where the original code used to live so they don't conflict. E.g.
> bundled/dnspython rather than third_party/dnspython.
>
>> We did however show that the issues around initialisation of the
>> submoudle git modules could be handled fairly well by the fact that
>> submodule objects live in the .git directory, or a helper script for the
>> original git clone/fetch.
> What specifically doesn't work there that isn't handled by
> --recursive?
>
> Cheers,
>
> Jelmer
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: OpenPGP digital signature
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20150624/1a0fc957/attachment.pgp>
More information about the samba-technical
mailing list