samba3 to samba4 migration a success story (detailed version) and thanks a lot to abartlet

Thorsten Trautwein-Veit thorsten.trautwein-veit at schulergroup.com
Mon May 7 06:29:26 MDT 2012


I want to inform the List about an successful migration of our Samba 3
Domain.

Maybe someone will find some pointers while he is doing something like
us. Please keep in mind all information is dependend on my machines and
must not meet your needs!

I was starting on 07.04.12 (german easter weekend). I had to migrate our
PDC (Samba 3.4.3)/BDC (Samba 3.5.3) with ldap backend, and our main
Filememberserver ( Samba 3.6.1 openldap backend) and two calculation
machines (number crunchers) with the same Samba versions and openldap
backends and one DXM ( Data eXchange Manager)
On our DCs are are 201 User accounts and 117 groups, we have 268
Computer accounts as well. Operating system for all servers is debian
squeeze. Prior to change our working domain i tested all of the upgrade
process in vmware (but i was hit by reality later).
Our PDC is a xen virtual machine and our BDC is real hardware later more
on this.
On Saturday i shutdown any remaining client and did the last backup
before i started it was very handy to have an actual ldif of my openldap
directory. I was following the steps in
https://wiki.samba.org/index.php/Samba4/HOWTO
I started with our PDC

Step 1 - Download Samba4
I used "git clone http://gitweb.samba.org/samba.git samba-master; cd
samba-master"
Hint do not forget to export your proxy anything like this :
"export
http_proxy=http://<username>:<password>@proxy.your-firm.something<:3128>"
this takes a while depending on your Internet bandwidth

Step 2- Compile Samba4
I got compile errors with "./configure.developer
--prefix=/usr/local/samba-4-20120403" and was advised to add the
parameter "--abi-check-disable" via IRC. The main thing was that i am
using an other version of gdb on my system and the abi checks seem to be
tied to a special gdb version. I got the compile error only under my
original x64 environment under 32bit all compiled fine.

Step 3 - Install Samba4
just make install did what it should in my case i was linking my samba
installation to /usr/local/samba for my own convince by doing
"ln -s /usr/local/samba-4-20120408 /usr/local/samba" it is just for
lazyness and easy updating of my installation

I skiped Step 4 - Provision Samba4 to Step 7 because i have to do an
upgrade and not an new install.

So after Step 3 i started with

Step 8 - Configure DNS
Because debians bind packages are to old for my needs i downloaded
bind9_9.8.1.dfsg.P1.orig.tar.gz and compiled it with :
"./configure --with-dlz-dlopen=yes --prefix=/usr --mandir=/usr/share/man
--infodir=/usr/share/info --sysconfdir=/etc/bind --localstatedir=/var
--enable-threads --enable-largefile --with-libtool --enable-shared
--enable-static
While --with-dlz-dlopen=yes is essential for the samba
samba/lib/bind9/dlz_bind9.so to do dynamic nameserver updates. "

I edited to my needs :
/etc/bind/named.conf.options:

options {
        directory "/var/cache/bind";

        // If there is a firewall between you and nameservers you want
        // to talk to, you may need to fix the firewall to allow multiple
        // ports to talk.  See http://www.kb.cert.org/vuls/id/800113

        // If your ISP provided one or more IP addresses for stable
        // nameservers, you probably want to use them as forwarders.
        // Uncomment the following block, and insert the addresses replacing
        // the all-0's placeholder.

        forwarders {
                153.3.XX.XX;
        };

        auth-nxdomain no;    # conform to RFC1035
        listen-on-v6 { any; };

        allow-recursion { any; };
        allow-query { any; };
        allow-query-cache { any; };

        tkey-gssapi-keytab "/usr/local/samba/private/dns.keytab";
        //tkey-gssapi-credential "DNS/wzbgpdc1.sctg.schuler.de";
        //tkey-domain "SCTG.SCHULER.DE";
};
the forwarder statement is our man enterprise DNS Server which delegates
all zones. This is needed to resolve all other IPs. Every thing else is
like it is described in the howto

/etc/bind/named.conf.local:

//
// Do any local configuration here
//

// Consider adding the 1918 zones here, if they are not used in your
// organization
//include "/etc/bind/zones.rfc1918";
include "/usr/local/samba/private/named.conf";
};

/usr/local/samba/private/named.conf:

# This DNS configuration is for BIND 9.8.0 or later with dlz_dlopen support.
#
# This file should be included in your main BIND configuration file
#
# For example with
# include "/usr/local/samba-4-20120408/private/named.conf";

#
# This configures dynamically loadable zones (DLZ) from AD schema
#
dlz "AD DNS Zone" {
    database "dlopen /usr/local/samba/lib/bind9/dlz_bind9.so";
};

and at least
/etc/resolv.conf:
domain sctg.schuler.de
search sctg.schuler.de
nameserver 127.0.0.1

Step 9 - Testing kerberos
for kerberos i linked in /etc/krb5.conf to
/usr/local/samba/private/krb5.conf
"ln -s /usr/local/samba/private/krb5.conf /etc/krb5.conf"
krb5.conf:
[libdefaults]
        default_realm = SCTG.SCHULER.DE
        dns_lookup_realm = false
        dns_lookup_kdc = true

Step 10 - Configure kerberos DNS dynamic updates (optional)
was done in Step 8 allready

Step 11 - Configure NTP (optional)
i think you really need ntp for an Samba4 Installation with more than
one member. So in my opinion and setup having ntp work right is a must
on every member server of my Samba 4 domain because kerberos is clock
dependend.

/etc/ntp.conf
server timesrerver.your-company.whatever
driftfile /var/lib/ntp/ntp.drift
server 127.127.1.1 version 3
fudge  127.127.1.1 stratum 12

After this i started with
https://wiki.samba.org/index.php/Samba4/samba3upgrade/HOWTO
to migrate my existing Users, Groups and Machineaccounts. I followed the
"Upgrading in Place" guide because i had tested it in my vmware Network.

I had to clean my ldapdirectory a little bit, while
/usr/local/samba/bin/samba-tool domain samba3upgrade was complaining
about different things like  double used uid(s), i had more then one
root account in it and stuff like this.
To test the upgrade i copied all Samba3 "*.tdb" files to "/tmp/tdb" and
tested the upgrade with
"./samba-tool domain samba3upgrade --dbdir=/tmp/tdb --use-xattrs=yes
--realm=sctg.schuler.de /usr/local/samba_3.4.3/lib/smb.conf"
and fixed my ldap one by one.
Once i imported all Users, Machines successfully i deleted my Samba4
Install and installed it again and upgraded it one last time.

This all took round about 4 hours of work, but is depending on your
Internet connection, the compute power of your machine and things.

Our BDC was installed following
https://wiki.samba.org/index.php/Samba4/HOWTO without doing a provision,
because all information should be replicated to my second domain controller.

I edited my smb.conf file to meet my domain declaration on my PDC which
was made by the samba3upgrade process.
PDC smb.conf:
# Global parameters
[global]
        server role = domain controller
        workgroup = SCTG
        realm = sctg.schuler.de
        netbios name = WZBGPDC1
        passdb backend = samba4
        server string = sctg ad dc1
        log level = 1
        domain logons = yes
        wins support = yes
        private dir = /usr/local/samba/private
        ncalrpc dir = /usr/local/samba/var/run/ncalrpc
        winbindd socket directory = /usr/local/samba/var/run/winbindd
        winbindd privileged socket directory =
/usr/local/samba/var/lib/winbindd_privileged
        ntp signd socket directory = /usr/local/samba/var/run/ntp_signd
        dns update command = /usr/local/samba/sbin/samba_dnsupdate
        spn update command = /usr/local/samba/sbin/samba_spnupdate
        samba kcc command = /usr/local/samba/sbin/samba_kcc
        lock dir = /usr/local/samba/var/lock
        state directory = /usr/local/samba/var/locks
        cache directory = /usr/local/samba/var/cache
        pid directory = /usr/local/samba/var/run
        wins server =

[netlogon]
        path = /usr/local/samba/var/locks/sysvol/sctg.schuler.de/scripts
        read only = No

[sysvol]
        path = /usr/local/samba/var/locks/sysvol
        read only = No

On our BDC:
# Global parameters
[global]
        server role = domain controller
        workgroup = SCTG
        realm = sctg.schuler.de
        netbios name = WZBGPDC2
        passdb backend = samba4
        log level = 2

[netlogon]
        path =
/usr/local/samba-4-20120408/var/locks/sysvol/sctg.schuler.de/scripts
        read only = No

[sysvol]
        path = /usr/local/samba-4-20120408/var/locks/sysvol
        read only = No

BDC /etc/krb5.conf :
[libdefaults]
        default_realm = SCTG.SCHULER.DE
        dns_lookup_realm = true
        dns_lookup_kdc = true

[realms]
        SCTG.SCHULER.DE = {
                kdc = wzbgpdc1.sctg.schuler.de:88
                admin_server = wzbgpdc1.sctg.schuler.de:749
                default_domain = sctg.schuler.de
        }

[domain_realm]
        .sctg.schuler.de = SCTG.SCHULER.DE
        sctg.schuler.de = SCTG.SCHULER.DE


bdc the /etc/resolv.conf:
domain sctg.schuler.de
search sctg.schuler.de
nameserver 153.3.xxx.xxx
is actually pointing to the PDC bind

I joined the BDC to the running PDC with : "samba-tool domain join
sctg.schuler.de DC -Uadministrator%<password> --realm=sctg.schuler.de -d2"
After starting Samba 4 on the BDC with :
"./samba -i -M single -d2" i have seen that both DCs started replicating
by using "samba-tool drs showrepl" have a close look on the line
starting with :
0 consecutive failure(s). It takes some time to replicate all Data from
one to the other DC.

To have an redundant (Samba 4 AND Nameserver Setup) i tried what was
suggested in the samba-technical mailing list and performed an :
samba_upgradedns.

This was the first what was not working. I was seeing in the log that
"dreplsrv_partition[DC=DomainDnsZones,DC=sctg,DC=schuler,DC=de] loaded"
started to replicate my DNS Zones, but i got no DNS records. I must say
that i don't use much time to solve this step. I let it broken because i
thought i need my time to migrate more of the domain. I was over the
point of no return.
[Later i was following an threat on the technical mailing list where two
Daniele and Adreas where fighting the same problem and don't solve it.
Is this right? I think so.]

Then it was time to migrate the Logonscripts i copied them from my old
netlogon share to the new netlogon share. In the first shot i forgot to
set the right owner of the logonscript and the acls to let the edv group
edit the file in any case. Keep in mind that it is your job to replicate
the contens of the sysvol share. I use unison
http://www.cis.upenn.edu/~bcpierce/unison/ but maybe csync is easier and
will do the same job http://www.csync.org/.

Then i created my init.d samba4 start script.
/etc/init.d/samba:
#!/bin/sh

### BEGIN INIT INFO
# Provides:          samba
# Required-Start:    $network $local_fs $remote_fs
# Required-Stop:     $network $local_fs $remote_fs
# Default-Start:     2 3 4 5
# Default-Stop:      0 1 6
# Short-Description: start Samba daemons
### END INIT INFO

#
# Start/stops the Samba daemon (samba).
# Adapted from the Samba 3 packages.
#

SAMBAPID=/var/run/samba/samba.pid

# clear conflicting settings from the environment
unset TMPDIR

# See if the daemon and the config file are there
test -x /usr/local/samba/sbin -a -r /usr/local/samba/etc/ || exit 0

. /lib/lsb/init-functions

case "$1" in
        start)
                log_daemon_msg "Starting Samba 4 daemon" "samba"

                if ! start-stop-daemon --start --quiet --oknodo --exec
/usr/local/samba/sbin/samba -- -D; then
                        log_end_msg 1
                        exit 1
                fi

                log_end_msg 0
                ;;
        stop)
                log_daemon_msg "Stopping Samba 4 daemon" "samba"

                start-stop-daemon --stop --quiet --name samba $SAMBAPID
                # Wait a little and remove stale PID file
                sleep 1
                if [ -f $SAMBAPID ] && ! ps h `cat $SAMBAPID` > /dev/null
                then
                        # Stale PID file (samba was succesfully stopped),
                        # remove it (should be removed by samba itself
IMHO.)
                        rm -f $SAMBAPID
                fi

                log_end_msg 0

                ;;
        restart|force-reload)
                $0 stop
                sleep 1
                $0 start
                ;;
        *)
                echo "Usage: /etc/init.d/samba
{start|stop|restart|force-reload}"
                exit 1
                ;;
esac

exit 0

It is mostly "lend" from
http://www.bryanpopham.com/tutorials/Samba4PDCWin7WinXP.html#make with
many thanks.


After i checked that i am able to login to a workstation with my (old)
password and that my logon script was executed i started to migrate our
main file server.

I updated the file server to Samba 3.6.3 at this time and got a bunch of
problems.
While my attention was focused on Samba 4 i missed that the idmap
backend syntax and usage changed. So i tried without success to
integrate the main fileserver into the domain. I stoped to work on it
after a view hours, when constantly winbind failed to return my newly
ADS Samba4 Users and groups.
Even it was really late i was going home for a sleep and wanted to try
it again on the next day.

On the next day ( 08.04.12 ) my idea (i had no klue why i don't get the
Samba3 on the fileserver running) was that i use Samba 4 on the
fileserver as well. This approach was promising because after 2 hours i
joined our file server to the ADS domain. But while i tried to to
migrate my S3 smb.conf i noticed that Samba 4 does not support "ms dfs"
on which many things depend here.
So i was going back to the mailinglists searching for my issue. I found
this bug : https://bugzilla.samba.org/show_bug.cgi?id=8371 which is
shown to be fixed in Samba 3.6.3 but i had very similar problems. wbinfo
-p worked, wbinfo -t worked, wbinfo -i <username> worked but getent,
chown or chgrp where failing. Because i do not wanted to loose any more
time i was going backward in the git commitlogs searching witch was the
last version of Samba 3 before the change of the idmap interface and
started over with Samba 3.5.13. Disadvantage is that i was not able to
use the smb2 protocol which really speeds up my Windows 7 clients.
With this version i was able do get all user and group informations by
winbindd and using it in commands like chown.

BUT i missed that all uid and gid have changed. So every file on our
file server with the old uid number instead of the username it belongs
to. And even worst all acls had the same problem. I found no easy way to
associate the old ldap uids and gids to the new ADS users so i started
to fix owner and group with this script :
#!/bin/sh
#set -x

#targetdir=/pcdaten/admin
#targetdir=/pcdaten/scan
#targetdir=/pcdaten
#targetdir=/dxm
targetdir=/dat01/simufact
#targetdir=/dat01/simufact

grouplist=`cat /root/gruppenliste.txt`
userlist=`cat /root/userliste.txt`

for item in $grouplist
do
        gid=`echo $item | cut -f 2 -d :`
        group=`echo $item | cut -f 1 -d :`
        find $targetdir -gid $gid -print0 | xargs -0 chown --from=:$gid
:"$group"
done

for item in $userlist
do
        uid=`echo $item | cut -f 2 -d :`
        user=`echo $item | cut -f 1 -d :`
        find $targetdir  -uid $uid -print0 | xargs -0 chown --from=$uid
$user
done

The grouplist (gruppenliste.txt) was created by "getent group | cut -f
1,3 -d : >gruppenliste.txt" on an virtual clone of my old PDC. The
userlist was created on the vmware machine by "getent passwd | cut -f
1,3 -d :".
gruppenliste.txt:
domuser:23002
konstr:24003
sctgcmswiki:24006
prj-m:24004
vertrieb:24005
...

userliste.txt:
domuser:23002
konstr:24003
sctgcmswiki:24006
prj-m:24004
vertrieb:24005
arbeitsvorbereitung:24007
....


Note you can use -exec in the script but the use of xargs was much
faster in my case.

To get my acls back i used "getfacl -R -s -p /pcdaten/tebis >tebis.acl"
to read all acls under the subtree /pcdaten/tebis and store it in an
file, in this case tebis.acl. To replace the old uid/gid i used this
script :
sctgfs01:~/berechtigungenKorrigieren# setacls.sh tebis.acl
#!/bin/sh
#set -x
grouplist=`cat /root/berechtigungenKorrigieren/gruppenliste.txt`
userlist=`cat /root/berechtigungenKorrigieren/userliste.txt`
cp $1 $1neu.txt

for item in $grouplist
do
        gid=`echo $item | cut -f 2 -d :`
        group=`echo $item | cut -f 1 -d :`
        sed s/"group:$gid"/"group:$group"/g $1neu.txt >$1run.txt
        cp $1run.txt $1neu.txt
        rm $1run.txt
done

for item in $userlist
do
        uid=`echo $item | cut -f 2 -d :`
        user=`echo $item | cut -f 1 -d :`
        sed s/"user:$uid"/"user:$user"/g $1neu.txt >$1run.txt
        cp $1run.txt $1neu.txt
        rm $1run.txt
done

The script create a new file in this case "tebis.aclneu.txt" after
checking that everthing is ok (vimdiff tebis.acl tebis.aclneu.txt) a
"setfacl --restore=tebis.aclneu.txt" sets the corrected acl in the
filesystem.
I was doing it for 6 TiB and it took for many many hours.

It was again time to hurry for a short nap at home :)

09.04.12
My college joined into and from now on where were working together to
get everything ready for the 10.04.12 where early shift starts at 04.00
o'clock. I had several file systems where i had to fix my uid/gid
informations and was facing the fact that winbind stoped working out of
a sudden. The processes where running but no domain info was returned.
Stopping and starting winbindd got it working again but i was fearing
that this will happen (and it happens from time to time) while my users
work.
So i wrote a little monitoring script for winbind.
/usr/local/bin/checkwinbind.sh:
sctgfs01:~/berechtigungenKorrigieren# cat /usr/local/bin/checkwinbind.sh
#!/bin/sh
#set -x
lokal=`cat /etc/group | wc -l`
netz=`getent group | wc -l`

#echo $lokal
#echo $netz

if [ ! $netz -gt $lokal ]
then
        echo "!! winbind ausgefallen !!"
        date
        /etc/init.d/winbind stop
        sleep 3
        winbindcount=`ps -ef | grep /usr/local/samba/sbin/winbind | wc -l`
        while [ $winbindcount -gt 1 ]
        do
                ps -ef | grep /usr/local/samba/sbin/winbind | tr -s ' '
| cut -f 2 -d ' ' | xargs kill -9
                sleep 1
                winbindcount=`ps -ef | grep
/usr/local/samba/sbin/winbind | wc -l`
        done
        sleep 1
        /etc/init.d/winbind start
fi

It is based on the idea that there are not as much groups in /etc/group
then in the ADS. So i compare the line count of "getent group" and "cat
/etc/group" and assume if "getent group" returns not more lines then
"cat /etc/group" something with winbind is wrong and i kill it and
restart it.

While setfacl was running i took some time to have a deeper look into
the dns replication between my DC1 and DC2 and noticed that my
RIDManager Role was hmmm absend. To check this i used "samba-tool fsmo
show" which returned only an error. But for me the samba 4 was working
so far. So i focused on the most priority topics.

On 10.04.2012 all of my users were able to login and work. Member file
servers where included by nfs V4 (with acls) into the main fileserver
and corrected dfs entries. But i was going deeper into my installation
searching for "where is my RIDManager" and how can i have redundant DNS
services.

To make it short i got really lots of errors (more then 180)  while i
was doing an "make quicktest" on my DC1 (xen virtual machine) while the
same sources complete a "make quicktest" on my DC2 with "ALL OK".

On 12.04.12 i was asking on IRC in #samba-technical for help with this
issue. And Mr. abartlet (thanks a lot again) helped me. First it i quiet
uncommon that the quicktest fails, if it does somethin wired is going
on. It took me endless painfull recompiles on both DCs to find the
following out:
I was using an 2.6.35 Kernel on the XEN host  and an 2.26.26-2 kernel in
my DC2 (domu). Getting a more actuell kernel into my domU DC1 solved a
view issues the "make quicktest" was throwing. The real problem where
the broadcom adapter i was using. I used tcp checksum offloading by the
networkcard which don't work for me. I switched it off by using
"ethtool" in
/etc/rc.local :
/sbin/ethtool -K eth0 tx off
/sbin/ethtool -K eth0 rx off
/sbin/ethtool -K eth0 gso off

/sbin/ethtool -K eth1 tx off
/sbin/ethtool -K eth1 rx off
/sbin/ethtool -K eth1 gso off

And from this time on i was able to do an "make quicktest" on this
machine without any hassle. Mr Abartlet patched the "samba-tool dbcheck
--fix" for me to get my RIDManager Role back. And that is what i use up
to date.

I am working on an idmap backend ldap with Samba Version 3.6.4 to
integrate my other fileservers as well and get rid of the nfs mounts
which i don't like. I have two of my servers using an openldap server as
backend. The improtant lines in smb.conf are:
[global]
        dos charset = ISO8859-1
        unix charset = ISO8859-1
        workgroup = SCTG
        netbios name = wzbgpsf1
        security = ads
        realm = SCTG.SCHULER.DE


        winbind enum users = yes
        winbind enum groups = yes
        winbind use default domain = no

        ldap idmap suffix                       = ou=idmap
        ldap ssl = no
        idmap backend                           = ldap

        idmap config * : range          = 700001 - 800000
        idmap config * : backend        = tdb

        idmap config sctg : backend             = ldap
        idmap config sctg : range               = 40000-700000
        idmap config sctg : ldap_url            = ldap://sctgfs01.schuler.de
        idmap config sctg : ldap_base_dn        =
ou=idmap,dc=sctg,dc=schuler,dc=de
        idmap config sctg : ldap_user_dn        =
cn=admin,dc=sctg,dc=schuler,dc=de

        template homedir = /homeu/%U
        template shell = /bin/bash

        wins server = 153.3.131.119

        max protocol = smb2

But this needs more testing. It was not possilbe to get the ldap
populatetd with an /usr/local/samba/var/locks/winbindd_cache.tdb file so
i had to do it by hand.
I was using a script like on my openldapserver:
sctgfs01:~/ldap/changeid# cat change_idmap.sh
#!/bin/sh
set -x
SAMBABIN=/usr/local/samba/bin
pwdlist=`getent passwd`
userlist=`$SAMBABIN/wbinfo -u`

for user in $userlist
do
        sid=`$SAMBABIN/wbinfo -n $user | cut -f 1 -d ' '`
        uid=`getent passwd | grep $user: | cut -f3 -d :`

        dn=`ldapsearch -LLL "(sambaSID=$sid)" -x dn >$uid.ldif`
        head --lines=2 $uid.ldif >$uid-2.ldif
        echo "changetype: modify" >>$uid-2.ldif
        echo "replace: uidNumber" >>$uid-2.ldif
        echo "uidNumber: $uid"  >>$uid-2.ldif
        ldapmodify -w LuckyStrice -D cn=admin,dc=sctg,dc=schuler,dc=de
-x -f $uid-2.ldif
done

To check if user and groups match my fileserver i use this script :
wzbgpsf1:~# cat checkidmap.sh
#!/bin/sh
set -x
mkdir -p /dat01/simufact/tht/test/user/
userlist=`wbinfo -u`

for user in $userlist
do
        touch /dat01/simufact/tht/test/user/$user
        chown $user /dat01/simufact/tht/test/user/$user
done


mkdir -p /dat01/simufact/tht/test/gruppen/
grouplist=`wbinfo -g`

for group in $grouplist
do
        touch /dat01/simufact/tht/test/gruppen/$group
        chgrp $group /dat01/simufact/tht/test/gruppen/$group
done

I mount the filesystems i have created  and chown/chgrp the files via
nfs on my fileserver and check if the filename matches the owner/group
of the file.

When i am shure that the idmap in openldap works i will upgrade all
member file servers to this configuration and Samba Version.

As soon as possible i will update my Samba 4 domain controllers when DNS
replication is icluded or "Group Policy Preferences" are aviable.

I hope this post will help someone while he is migrating to samba 4. If
someone has any further questions about this post contact me on IRC my
nick is ttv.

I want to thank the samba team for their good work and that they have an
open ear for an normal administrator.

Thanks to abartlet for his help.

Cheers all,
have a good time.


-- 
Mit freundlichen Grüßen · Best regards

*Dipl.-Ing. (FH) Thorsten Trautwein-Veit*
/Leitung EDV · IT-Manager/



More information about the samba-technical mailing list