[Samba] BIND9.8 DLZ performance issue

Arthur Ramsey arthur_ramsey at mediture.com
Thu Oct 13 15:11:25 UTC 2016


I got core dumps when the issue was happening.  Here are the backtraces: 
http://pastebin.com/N0e2fsSQ.

Seems to be TDB contention?

Thanks,
Arthur

On 10/7/2016 11:12 AM, Arthur Ramsey wrote:
>
> I'm hoping the issue is just load balancing, but I'm not sure. I can't 
> see to get the traffic balanced across two DCs.
>
> I ran this script on all Linux nodes to balance the traffic.
>
> #!/usr/bin/perl
> use strict;
> use warnings;
>
> my $primary_name_server;
> my $random = int(rand(10));
>
> open(my $resolv_conf_fh, '< /etc/resolv.conf') or die("Unable to open /etc/resolv.conf for reading: $!");
> while(<$resolv_conf_fh>) {
>      chomp;
>      if ($_ =~ /nameserver (.*)/) {
>          $primary_name_server = $1;
>          last;
>      }
> }
> close($resolv_conf_fh);
>
> if (! defined($primary_name_server) || $primary_name_server eq '192.168.168.64' || $primary_name_server eq '192.168.168.65') {
>      open(my $resolv_conf_fh, '> /etc/resolv.conf') or die("Unable to open /etc/resolv.conf for writing: $!");
>      print $resolv_conf_fh "search mediture.dom\n";
>      print $resolv_conf_fh "options rotate timeout:1\n";
>      if ($random >= 4) {
>          print $resolv_conf_fh "nameserver 192.168.168.64\n";
>          print $resolv_conf_fh "nameserver 192.168.168.65\n";
>      } else {
>          print $resolv_conf_fh "nameserver 192.168.168.65\n";
>          print $resolv_conf_fh "nameserver 192.168.168.64\n";
>      }
>      close($resolv_conf_fh);
>      
>      if (-f '/usr/bin/wbinfo') {
>          open(my $krb5_conf_fh, '> /etc/krb5.conf') or die("Unable to open /etc/krb5.conf for writing: $!");
>          print $krb5_conf_fh q([logging]
>   default =FILE:/var/log/krb5libs.log
>   kdc =FILE:/var/log/krb5kdc.log
>   admin_server =FILE:/var/log/kadmind.log
>   default_realm = MEDITURE.DOM
>
> [libdefaults]
>   default_realm = MEDITURE.DOM
>   dns_lookup_realm = false
>   dns_lookup_kdc = false
>   ticket_lifetime = 24h
>   renew_lifetime = 7d
>   forwardable = true
>   default_keytab_name =FILE:/etc/krb5.keytab
>
> [realms]
>   MEDITURE.DOM = {);
>          if ($random >= 4) {
>          	print $krb5_conf_fh " kdc = dc01.mediture.dom\n";
>          	print $krb5_conf_fh " kdc = dc03.mediture.dom\n";
>          	print $krb5_conf_fh " kdc = dc02.mediture.dom\n";
>          	print $krb5_conf_fh " kdc = dc04.mediture.dom\n";
>          } else {
>          	print $krb5_conf_fh " kdc = dc03.mediture.dom\n";
>          	print $krb5_conf_fh " kdc = dc01.mediture.dom\n";
>          	print $krb5_conf_fh " kdc = dc04.mediture.dom\n";
>          	print $krb5_conf_fh " kdc = dc02.mediture.dom\n";
>          }
>          print $krb5_conf_fh q(  default_realm = MEDITURE.DOM
>   }
>
> [domain_realm]
>    mediture.dom = MEDITURE.DOM
>    .mediture.dom = MEDITURE.DOM);
>          close($krb5_conf_fh);
>          
>          open(my $smb_conf_fh, '> /etc/samba/smb.conf') or die("Unable to open /etc/samba/smb.conf for writing: $!");
>          print $smb_conf_fh q([global]
> #--authconfig--start-line--
>     workgroup = MEDITURE
>     password server = );
>          if ($random >= 4) {
>          	print $smb_conf_fh 'dc01.mediture.dom ';
>          	print $smb_conf_fh 'dc03.mediture.dom ';
>          	print $smb_conf_fh 'dc02.mediture.dom ';
>          	print $smb_conf_fh 'dc04.mediture.dom';
>          } else {
>          	print $smb_conf_fh 'dc03.mediture.dom ';
>          	print $smb_conf_fh 'dc01.mediture.dom ';
>          	print $smb_conf_fh 'dc04.mediture.dom ';
>          	print $smb_conf_fh 'dc02.mediture.dom';
>          }
>          print $smb_conf_fh q(
>     realm = MEDITURE.DOM
>     security = ads
>     
>     template homedir = /home/%U
>     template shell = /bin/bash
>
>     winbind use default domain = true
>
> #--authconfig--end-line--
>     server string = Samba Server Version %v
>
>     # logs split per machine
>     log file = /var/log/samba/log.%m
>     # max 50KB per log file, then rotate
>     max log size = 50
>     
>     passdb backend = tdbsam
>     
>     winbind refresh tickets = yes
>     winbind offline logon = yes
>     winbind use default domain = yes
>     winbind nss info = rfc2307
>     winbind enum users = yes
>     winbind enum groups = yes
>     winbind nested groups = yes
>     
>     kerberos method = secrets and keytab
>     
>     idmap config *: backend = tdb
>     idmap config *: range = 90000001-100000000
>     
>     idmap config MEDITURE: backend = ad
>     idmap config MEDITURE: range = 10000-49999
>     idmap config MEDITURE: schema mode = rfc2307);
>          close($smb_conf_fh);
>          close($resolv_conf_fh);
>      }
> }
> I also have AD sites setup and have manually configured SRV records to 
> perform load balancing.
> $ dig +short srv _ldap._tcp.vsc._sites.dc._msdcs.mediture.dom
> 0 50 389 dc02.mediture.dom.
> 0 25 389 dc04.mediture.dom.
> 0 100 389 dc01.mediture.dom.
> 0 100 389 dc03.mediture.dom.
>
> $ dig +short srv _ldap._tcp.aws._sites.dc._msdcs.mediture.dom
> 0 25 389 dc02.mediture.dom.
> 0 100 389 dc04.mediture.dom.
> 0 50 389 dc01.mediture.dom.
> 0 50 389 dc03.mediture.dom.
>
> $ dig +short srv _ldap._tcp.epo._sites.dc._msdcs.mediture.dom
> 0 25 389 dc04.mediture.dom.
> 0 100 389 DC02.mediture.dom.
> 0 50 389 dc01.mediture.dom.
> 0 50 389 dc03.mediture.dom.
>
> $ dig +short srv _ldap._tcp.Default-First-Site-Name._sites.dc._msdcs.mediture.dom
> 0 100 389 dc01.mediture.dom.
> 0 100 389 dc03.mediture.dom.
>
> $ dig +short srv _ldap._tcp.vsc._sites.mediture.dom
> 0 100 389 dc01.mediture.dom.
> 0 100 389 dc03.mediture.dom.
> 0 50 389 dc02.mediture.dom.
> 0 25 389 dc04.mediture.dom.
>
> $ dig +short srv _ldap._tcp.aws._sites.mediture.dom
> 0 100 389 dc04.mediture.dom.
> 0 50 389 dc01.mediture.dom.
> 0 50 389 dc03.mediture.dom.
> 0 25 3268 dc02.mediture.dom.
>
> $ dig +short srv _ldap._tcp.epo._sites.mediture.dom
> 0 25 389 dc04.mediture.dom.
> 0 100 389 dc02.mediture.dom.
> 0 50 389 dc01.mediture.dom.
> 0 50 389 dc03.mediture.dom.
>
> $ dig +short srv _ldap._tcp.Default-First-Site-Name._sites.mediture.dom
> 0 100 389 dc04.mediture.dom.
> 0 100 389 dc01.mediture.dom.
> 0 100 389 dc02.mediture.dom.
> 0 100 389 dc03.mediture.dom.
> I'm not seeing balanced traffic though.
> [root at dc01 ~]# netstat -an | grep 445 | grep -c ESTABLISHED
> 164
> [root at dc03 ~]# netstat -an | grep 445 | grep -c ESTABLISHED
> 10
>
> [root at dc01 ~]# netstat -an | grep 88 | grep -c ESTABLISHED
> 20
> [root at dc03 ~]# netstat -an | grep 88 | grep -c ESTABLISHED
> 2
>
> [root at dc01 ~]# netstat -an | grep 389 | grep -c ESTABLISHED
> 175
> [root at dc03 ~]# netstat -an | grep 389 | grep -c ESTABLISHED
> 23
>
> [root at dc01 ~]# netstat -an | grep 636 | grep -c ESTABLISHED
> 3
> [root at dc03 ~]# netstat -an | grep 636 | grep -c ESTABLISHED
> 7
>
> [root at dc01 ~]# netstat -an | grep 53 | grep -c ESTABLISHED
> 42
> [root at dc03 ~]# netstat -an | grep 53 | grep -c ESTABLISHED
> 6
> I only have a handful of Windows instances joined to the domain at 
> that site, VSC, but over 100 Linux nodes.
>
> Thanks,
> Arthur
>
> On 09/29/2016 10:16 AM, Arthur Ramsey wrote:
>> Hello,
>>
>> I'm running Samba 4.5.0 and bind-9.8.2-0.47.rc1.el6_8.1.  One DC of 
>> four, the PDC, is magnitudes slower running 
>> /usr/local/samba/sbin/samba_dnsupdate --verbose --all-names. When 
>> that is running on that DC it seems to block any queries. The load 
>> average is usually under 0.5.  The DC was unsafely halted, which 
>> could have corrupted something.  I ran a dbcheck with samba-tool and 
>> it came back clean other than the expected cleanup after upgrading to 
>> 4.5.0.  Is there any caches or similar that I could try clearing for 
>> BIND?  Usually at least once a day the memory increases from the 
>> typical ~1 GB of usage to everything the box has, 8 GB physical and 
>> 10 GB swap, requiring a forceful restart, so there appears to be a 
>> memory leak as well.  When memory usage is high, it is from smbd 
>> process, which I wouldn't think would have a correlation to BIND.  
>> Rather than a memory leak, the blocking seen with DNS queries is also 
>> blocking smb clients resulting in a pile of connections and high 
>> memory usage? The load under this condition is very high, but that is 
>> due to high IO and CPU usage from swapping.  I had similar behavior 
>> with 4.4.5, but it was fine for the first couple of weeks after upgrade.
>>
>> Thanks,
>> Arthur



This e-mail and any attachments may contain CONFIDENTIAL information, including PROTECTED HEALTH INFORMATION. If you are not the intended recipient, any use or disclosure of this information is STRICTLY PROHIBITED; you are requested to delete this e-mail and any attachments, notify the sender immediately, and notify the Mediture Privacy Officer at privacyofficer at mediture.com.


More information about the samba mailing list