status of DRS efforts in Samba4 (and a developer tutorial)

tridge at samba.org tridge at samba.org
Thu Sep 10 20:32:40 MDT 2009


Prompted by the message from Rodolfo, I thought it might be useful to
give some status on the DRS efforts in Samba4, and a few very rough
instructions on how to test what we've got at the moment.

First some background. For years we have been slowly building up the
ability of Samba4 to be an active directory domain controller. One of
the key features of a DC is the ability to replicate with other DCs in
the same domain, so that if you (for example) add a user on one of the
DCs then that user will appear on the other DCs within a few seconds. 

Samba4 has for quite a while had the ability to be a standalone AD
domain controller, and that is in production use at some sites. That
is a great achievement, but for more widespread use we really need the
ability to do replication, otherwise Samba4 will not be able to be
added as an additional domain controller to existing Windows domains. 

The key to replication is the DRSUAPI RPC protocols. They implement a
multi-master directory service, and have quite complex protocols for
working out what changes need to be sent between DCs.

Stefan Metzmacher (metze) developed a lot of the base code as part of
his thesis work. He did an amazing job, especially since at the time
we didn't have any protocol documentation and he worked it out by
looking at wire traces, and from technet overview docs like this one:

  http://technet.microsoft.com/en-us/library/cc772829(WS.10).aspx

The code metze wrote forms the basis for a lot of the activity in the
Samba4 source tree over the last few weeks. We have finally reached a
point in the development of Samba4 where all the various pieces are
coming together and we can start to make large parts of DRS actually
work. We're hoping to test a lot of this code at the DRS plugfest on
Microsoft campus later this month.

The things we currently have working (as of yesterday!) are as
follows:

 1) as before, we can create a standalone Samba4 domain controller,
 using the usual provision script

 2) now we can use "net vampire" to join another Samba4 machine to the
 domain. The vampire process pulls a copy of the entire directory from
 the first machine and uses it to populate its own directory. It also
 sets up the necessary records for the two machines to use incremental
 replication from then on to keep themselves in sync. The server side
 of this is based on some great work by Anatoliy Atanasov, extended a
 bit by Andrew and myself over the last few days.

 3) we can "net vampire" to both w2k3 and w2k8, and then after the
 vampire is complete Samba can pull incremental changes from the
 windows DC to the Samba database.

We have not yet demonstrated windows pulling incremental changes from
Samba, or at least not successfully. Windows has pulled changes, but
so far it has mostly caused windows to crash soon afterwards! I think
we may have fixed the main bugs that caused that in the last couple of
days, and I am hopefully that we will demonstrate the first successful
replication of samba<->windows in the next week. We are still a long
way from this being ready for production use of course.

Now I'll give a very rough howto for developers who may be interested
in reproducing what we've done. One of the interesting things about
this howto is that it doesn't now need a windows box. The ability to
do replication samba4<->samba4 means that interested developers can
now experiment with DRS replication in a Samba-only environment, which
can make debugging and testing much easier. We obviously need to
ensure it works with Windows as well, but initial development of
features in a Samba-only environment can really help development.


Step 1) Get the latest git tree.

You really do need the latest tree. If your tree is just a few hours
old then you could be missing major features! The code is developing
very rapidly.

A basic git checkout would be:

  git clone git://git.samba.org/samba.git samba4


Step 2) build Samba4 as usual. 

Common steps are:

  cd samba4/source4
  ./autogen.sh
  ./configure.developer --prefix=$HOME/prefix.s4
  make
  make install

You might also like to run "make test" or "make quicktest". Be warned
that "make test" can take a long time (over half an hour on my laptop)
and some of the tests will probably fail. "make quicktest" takes just
a couple of minutes and at least lets you know that the git tree you
have works to some extent. If "make quicktest" does not pass
completely then the tree is probably badly broken. 


Step 3) provision your Samba domain controller

You will need to decide what domain name you will use. At home I use a
DNS domain of "bludom.tridgell.net" with a NBT workgroup name of
"bludom". My test machine (where I built Samba4) is called "blu" with
an IP address of 10.0.0.1. I'll use those names in the example below.

You could cd to the source4 build directory again, and run something
like this:

 ./setup/provision --realm=bludom.tridgell.net --domain=bludom --host-name=blu --host-ip=10.0.0.1 --adminpass=penguin --server-role="domain controller"

If that works you'll see something like this:

 Server Role:           domain controller
 Hostname:              blu
 NetBIOS Domain:        BLUDOM
 DNS Domain:            bludom.tridgell.net
 DOMAIN SID:            S-1-5-21-2939463048-3248118054-807635424
 Admin password:        penguin


Step 4) Setup DNS

As part of the provision above you will end up with a DNS zone file in
$HOME/prefix.s4/private. In my case it's called
bludom.tridgell.net.zone.

You need to install bind9 on your machine, and enable this zone file
so that you can resolve DNS names within the bludom domain.

To do that with a modern bind9 install (I'm using the one in Ubuntu
Jaunty) you would add something like this to
/etc/bind/named.conf.local:

 zone "bludom.tridgell.net" IN {
        type master;
        file "/home/tridge/prefix.s4/private/bludom.tridgell.net.zone";
 };

Then restart bind9. You might also like to do something like this in
the options clause in /etc/bind/named.conf.options:

        forward only;
	forwarders {
	 	192.168.2.10;
	};

where the IP is the address of your LANs DNS server. That will mean
that your development box will only answer queries for the zone you've
configured, and will forward other queries onto your LANs DNS server.

Next check that your /etc/resolv.conf points to 127.0.0.1 for your
nameserver and you should be able to run commands like this:

  tridge at blu$ host -t SRV _ldap._tcp.bludom.tridgell.net localhost
  Using domain server:
  Name: localhost
  Address: 127.0.0.1#53
  Aliases:

  _ldap._tcp.bludom.tridgell.net has SRV record 0 100 389 blu.bludom.tridgell.net.


Step 5) Setup smb.conf

The provision process above will have setup a basic smb.conf in
$HOME/prefix.s4/etc/smb.conf. You should edit that to add a few more
useful things to the [global] section like this:

	interfaces      = 127.0.0.1/8 virbr0
	bind interfaces only = yes
	dreplsrv:periodic_interval = 10
	dreplsrv:periodic_startup_interval = 5

This sets up Samba4 to only listen on loopback and virbr0. The reason
we do that is we are going to be starting two copies of Samba on this
machine, replicating to each other, and we don't want the two to
collide when they both try to listen on the same network
interface. When we setup the 2nd copy of Samba4 we will set it to
listen on a different interface.

The dreplsrv options and to tell Samba4 to do a replication every 10
seconds (5 seconds for the first one). This is useful when doing
initial testing, and is also needed at the moment as (as of today)
Samba doesn't send DsReplicaSync messages to tell the other DCs to
replicate when a change happens in the Samba DB. So by doing the above
we are just saying "replicate a lot".


Step 6) Start the first copy of Samba

In a new terminal, go to the source4 directory again and run this:

  sudo bin/samba -i -M single -d4

That starts Samba4, and you will be running a DC in interactive
mode. That is most useful for debugging/developing this code. It will
put out messages like this every 10 seconds:

 dreplsrv_periodic_run(): schedule pull replication
 dreplsrv_periodic_run(): run pending_ops
 dreplsrv_periodic_schedule(10) scheduled for: Fri Sep 11 11:12:20 2009 EST

That is the replication engine running. At the moment we only have one
DC, so it isn't doing any real work, but you can see it trying.

When I'm debugging this, I often run this command:

  make bin/samba && sudo gdb --args bin/samba -i -M single -d4

that allows me to run Samba4 under gdb, and break into any of the
interesting DRS routines to watch them happen.


Step 7) Setup a 2nd copy of Samba4 on the same machine

This is where we create a 2nd copy of Samba4, running on the same
machine, and we will setup the two copies to replicate to each other.

You will need a 2nd network interface for this. You can either create
one using ifconfig magic, or you can use an exising one. I use the
virbr1 interface, which is another interface created by having kvm
installed on my machine (the first one was virbr0 which I used above
for the first instance of Samba4).

So I have two interfaces like this:

virbr0    Link encap:Ethernet  HWaddr aa:7a:de:62:e6:fc
          inet addr:10.0.0.1  Bcast:10.0.0.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

virbr1    Link encap:Ethernet  HWaddr 3a:45:04:48:bf:06
          inet addr:10.0.1.1  Bcast:10.0.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

We need to create a install directory for the 2nd copy of Samba4. We
don't need to make it a full copy, the following commands will do the
job:

  mkdir $HOME/prefix.s4.2
  cd $HOME/prefix.s4.2
  mkdir -p private etc var var/lib var/run var/locks var/ncalrpc

now we need a smb.conf in that 2nd directory

  cp $HOME/prefix.s4/etc/smb.conf $HOME/prefix.s4.2/etc/

and we need to edit that smb.conf to look something like this:

[globals]
	netbios name	= blu2
	workgroup	= bludom
	realm		= bludom.tridgell.net
	server role     = domain controller
	interfaces      = virbr1
	bind interfaces only = yes
	dreplsrv:periodic_interval = 10
	dreplsrv:periodic_startup_interval = 5

        ncalrpc dir = /home/tridge/prefix.s4.2/var/ncalrpc
        private dir = /home/tridge/prefix.s4.2/private
        swat directory = /home/tridge/prefix.s4.2/share/swat
        lock dir = /home/tridge/prefix.s4.2/var/locks
        pid directory = /home/tridge/prefix.s4.2/var/run
        winbindd socket directory = /home/tridge/prefix.s4.2/var/run/winbindd
        winbindd privileged socket directory = /home/tridge/prefix.s4.2/var/lib/winbindd_privileged
        ntp signd socket directory = /home/tridge/prefix.s4.2/var/run/ntp_signd

[netlogon]
	path = /home/tridge/prefix.s4/var/locks/sysvol/bludom.tridgell.net/scripts
	read only = no

[sysvol]
	path = /home/tridge/prefix.s4/var/locks/sysvol
	read only = no

notice that we are overriding all the directories to point at our 2nd
copy. This means that when we use the -s option to Samba tools to
point at this smb.conf it will also point all the other important
directories at the right place.

Notice also that we are setting the netbios name to a different name
from the main name of this machine. The 2nd copy of Samba is called
'blu2' whereas the first one is called 'blu'.

You'll need to make sure that 'blu' and 'blu2' resolve to the right
IPs. For example, check your /etc/hosts and make sure they are either
not there (and use DNS) or that they point at the two IPs you are
using for this test.


Step 8) run net vampire

This is the step where we will 'vampire' the first DC into the 2nd
one, so that the 2nd DC gets a copy of the entire LDAP directory. The
command we need is this (run from the source4 directory again):

   bin/net -s $HOME/prefix.s4.2/etc/smb.conf vampire -Uadministrator%penguin bludom.tridgell.net -d4

If all goes well you'll see a message like this after a minute or so:

  Vampired domain BLUDOM (S-1-5-21-2939463048-3248118054-807635424)

This is the step that we got working yesterday, so if you get this
working then you are probably one of the first people in the world to
do so!


Step 9) check with ldbsearch

Now let's check that the vampire did the right thing. Let's look at
the administrator account for the two DCs:

  bin/ldbsearch -H $HOME/prefix.s4/private/sam.ldb samaccountname=administrator
  bin/ldbsearch -H $HOME/prefix.s4.2/private/sam.ldb samaccountname=administrator

those two searches should give the same result. Right now there are
some bugs, and the two are not quite identical, but they are close.


Step 10) Start the 2nd copy of Samba

We are now ready to start the 2nd DC, and it should start replicating
with the first one. With the first copy of Samba still running in
another terminal, run this to start the 2nd one:

  sudo bin/samba -s $HOME/prefix.s4.2/etc/smb.conf -i -M single -d4

Within 10 seconds you should see messages about the two DCs
replicating between each other. You may well run into some
kerberos/DNS bugs (I'm working on that today), but the two DCs should
at least startup.

If things are working right, then if you use ldbedit to modify one of
the DCs, then the other DC will get the changes within 10
seconds. This is the bit that is most fragile right now, so don't
expect it to work perfectly.

If you've reached this stage then congratualtions, you now have a pair
of replicating Samba4 DCs. Now it's over to you to help develop this
further!


Step 11) Replicating with windows

You could instead substitute a windows box for one of the DCs in the
above test. If you do, you'll need to run scripting/bin/setup_dns.sh
after the vampire step to put the right DNS entries in the windows DNS
server. You also may find that the first time windows tries to
replicate to us that you may crash windows.


Step 12) Where to from here?

The above is a good start, but there is _lots_ more to do. For a
start, the above steps are very fragile. You will probably find things
failing quite often, and if you use ldbsearch you'll see that the
replication isn't perfect. For example the replica may get two copies
of the 'cn' attribute for every record, it may be missing the
parentGUID attributes and many other problems.

It also hasn't yet been made to fully work bi-directionally when one
of the replicas is a windows box (without crashing windows), and we
haven't yet tried having 3 or more DCs. We also haven't tried more
than one domain in a forest, and we don't automatically do all the
right TSIG-GSS DNS updates (though please look in
scripting/bin/setup_dns.sh for a start on that).

Anatoliy is working on implementing  the "Uptodateness vector", and
I'm working on trying to get the SPN and DNS stuff right. Andrew
Bartlett is working on trying to get this integrated into "make test"
so that it will stay working, and Metze is watching over us to make
sure we don't break too much of his great groundwork.

To go from this point to something that could be used in production
will take a huge effort, but I am delighted that we have come as far
as we have so quickly. I think we'll be able to make much faster
progress now that we can demonstrate the first Samba<->Samba DRS
replication.

Cheers, Tridge


More information about the samba-technical mailing list