[Samba] Overloaded samba server. Is it a bug?

Martin Scandroli masc at intraredes.com
Thu Oct 27 06:12:02 GMT 2005


Experts,

We've just migrated from samba 2.2.8a to samba 3.0.20b in a very large
corporate environment. Everything was really fine in our lab, but we
began
experiment serious load problems on the productive servers the morning
after
the procedure took place. I'll try (briefly) to describe the
characteristics
of the scenario:

Resources:

Old Environment:

        Hardware:
                Dell PowerEdge 2650
                        Intel Xeon Processor
                        2 GB Ram
Raid 5 (via perc raid controller) on 10k scsi disks
        Software:
                SuSE Linux Enterprise Server 8
                Samba 2.2.8a Servers
                cups printing service
openldap2 as backend (with replicas all over the country,
about 3000 objects in the tree)
                HeartBeat as high availability Service

Everything was charming here!!!!!!


New Environment

        Hardware:
                Dell PowerEdge 2850 Servers
2 Intel Xeon 3.2 GHz (HT i think... i see 4 of them)
Processors
                        4 GB Ram
Raid 5 (via Perc raid controller) on 15k scsi disks

        Software
                SuSE Linux Enterprise Server 9
                Samba 3.0.20b Servers
                cups printing service
Novell eDirectory 8.7.3.4 as backend (Very distributed too,
about 4000 objects in the tree)
                HeartBeat as high availability Service
drbd to keep samba configuracion replicated among the cluster
nodes.

Problems we're having (or had, just as a usefull comment):

eDirectory turned out to be much slower than openldap2 when responding
to nss_ldap queries (i mean.... about 7 or 8 times slower!!!!) so
queries
asking for members of large groups (i.e: groups with about 1500 users
and
above) were usually terminated with an RPC timeout

Everything started to work when we added the ldapsam:trusted=yes
parameter. It dramatically reduced the response times and affected
queries
began to work.
The implementation of this feature produced some other problems (we've
found workarrounds but i'll comment them just to provide some feedback).

        1) The samba server used to die seconds after it was started. 
Something about the nobody user and it's primary group prevented it from
working in a proper manner. We solved this inconvinient by adding de
user
nobody and it's corresponding primary group to the backend.
2) Root user was no longer recognized, (we still trying to figure out
why, the user's been added to the tree, but nothing changed) so we used
the
new role based administration provided by samba 3 as a workarround 
(SeMachinAccount...), and no more troubles about it.



        3)THIS ISSUE IS KILLING US!!!!!!!

Something happens in a determined moment of the day (rush hour).
Everything is running smoothly (0.3 - 0.4 of load average) when the load
start to grow indefinitely!!!!!!. It raises from 0.3 to 50 in a matter
of
seconds!, and it keeps growing till the server dies. We couldn't find
the
reason of this, but it happens in a two hors interval. Before and after
this
interval, there are no errors of any kind.

        I'll paste some log errors (just the ones i saw). I don't think 
they're the cause of our problems, buy you're the experts.

Any clue? do you need me to gather some kind of information? any DoS
bug reported for this samba version?

        Any help will be highly appreciated

Regards, 
Martin

--

        from /var/log/messages

        Oct 25 04:34:15 srvsmb01 smbd[2961]: [2005/10/25 04:34:15, 0] 
lib/util_sock.c:send_smb(762)
        Oct 25 04:34:15 srvsmb01 smbd[2961]:   Error writing 4 bytes to 
client. -1. (Connection reset by peer)
        Oct 25 04:40:36 srvsmb01 smbd[2983]: [2005/10/25 04:40:36, 0] 
lib/util_sock.c:get_peer_addr(1222)
Oct 25 04:40:36 srvsmb01 smbd[2983]: getpeername failed. Error was
Transport endpoint is not connected
        Oct 25 04:40:36 srvsmb01 smbd[2983]: [2005/10/25 04:40:36, 0] 
lib/util_sock.c:write_data(554)
Oct 25 04:40:36 srvsmb01 smbd[2983]: write_data: write failure in
writing to client 167.252.104.98. Error Connection reset
        by peer

        (this happens very often)

        From /var/log/samba/log.nmbd

tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959)
is already open in this process
        [2005/10/26 04:17:01, 2] tdb/tdbutil.c:tdb_log(767)
tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959)
is already open in this process
        [2005/10/26 04:17:01, 2] tdb/tdbutil.c:tdb_log(767)
tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959)
is already open in this process
        [2005/10/26 04:17:01, 2] tdb/tdbutil.c:tdb_log(767)
tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959)
is already open in this process
        [2005/10/26 04:17:01, 2] tdb/tdbutil.c:tdb_log(767)
tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959)
is already open in this process
        [2005/10/26 04:17:01, 2] tdb/tdbutil.c:tdb_log(767)
tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959)
is already open in this process
        [2005/10/26 04:17:01, 2] tdb/tdbutil.c:tdb_log(767)
tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959)
is already open in this process
        [2005/10/26 04:17:01, 2] tdb/tdbutil.c:tdb_log(767)
tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959)
is already open in this process
        [2005/10/26 04:17:01, 2] tdb/tdbutil.c:tdb_log(767)
tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959)
is already open in this process
        [2005/10/26 04:17:01, 2] tdb/tdbutil.c:tdb_log(767)
tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959)
is already open in this process
        [2005/10/26 04:17:01, 2] tdb/tdbutil.c:tdb_log(767)
tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959)
is already open in this process
        [2005/10/26 04:17:01, 2] tdb/tdbutil.c:tdb_log(767)
tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959)
is already open in this process
        [2005/10/26 04:17:02, 2] tdb/tdbutil.c:tdb_log(767)
tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959)
is already open in this process
        [2005/10/26 04:17:02, 2] tdb/tdbutil.c:tdb_log(767)
tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959)
is already open in this process
        [2005/10/26 04:17:02, 2] tdb/tdbutil.c:tdb_log(767)
tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959)
is already open in this process
        [2005/10/26 04:17:02, 2] tdb/tdbutil.c:tdb_log(767)
tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959)
is already open in this process
        [2005/10/26 04:17:02, 2] tdb/tdbutil.c:tdb_log(767)
tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959)
is already open in this process
        [2005/10/26 04:17:02, 2] tdb/tdbutil.c:tdb_log(767)
tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959)
is already open in this process
        [2005/10/26 04:17:02, 2] tdb/tdbutil.c:tdb_log(767)
tdb(unnamed): tdb_open_ex: /var/lib/samba/unexpected.tdb (2059,2959)
is already open in this process
        [2005/10/26 04:17:02, 2] tdb/tdbutil.c:tdb_log(767)

        from /var/log/samba/log.smbd

          smbldap_open: cannot access LDAP when not root..
        [2005/10/25 01:29:28, 1] lib/smbldap.c:another_ldap_try(951)
        Connection to LDAP server failed for the 1 try!
        [2005/10/25 01:29:29, 0] lib/smbldap.c:smbldap_open(822)
        smbldap_open: cannot access LDAP when not root..
        [2005/10/25 01:29:29, 1] lib/smbldap.c:another_ldap_try(951)
        Connection to LDAP server failed for the 2 try!
        [2005/10/25 01:29:29, 2] smbd/close.c:close_normal_file(270)
cmqtbe4 closed file Planta/TPM/Envasado/Linea4/LLENADORA/Merma Linea
4.xls (numopen=0)
        [2005/10/25 01:29:29, 2] smbd/open.c:open_file(372)
CMQTBE4 opened file Planta/TPM/Envasado/Linea4/LLENADORA/Merma Linea
4.xls read=No write=Yes (numopen=1)
        [2005/10/25 01:29:29, 2] smbd/close.c:close_normal_file(270)
cmqtbe4 closed file Planta/TPM/Envasado/Linea4/LLENADORA/Merma Linea
4.xls (numopen=0)
        [2005/10/25 01:29:30, 0] lib/smbldap.c:smbldap_open(822)
        smbldap_open: cannot access LDAP when not root..
        [2005/10/25 01:29:30, 1] lib/smbldap.c:another_ldap_try(951)
        Connection to LDAP server failed for the 3 try!
        [2005/10/25 01:29:31, 0] lib/smbldap.c:smbldap_open(822)
        smbldap_open: cannot access LDAP when not root..
        [2005/10/25 01:29:31, 1] lib/smbldap.c:another_ldap_try(951)
        Connection to LDAP server failed for the 4 try!
        [2005/10/25 01:29:32, 2] 
rpc_server/srv_spoolss_nt.c:find_printer_index_by_hnd(270)
        find_printer_index_by_hnd: Printer handle not found: 
_spoolss_writeprinter: Invalid handle (OTHER:15976:11737)
        [2005/10/25 01:29:32, 0] lib/smbldap.c:smbldap_open(822)
        smbldap_open: cannot access LDAP when not root..
        [2005/10/25 01:29:32, 1] lib/smbldap.c:another_ldap_try(951)
        Connection to LDAP server failed for the 5 try!
        [2005/10/25 01:29:33, 0] lib/smbldap.c:smbldap_open(822)
        smbldap_open: cannot access LDAP when not root..
        [2005/10/25 01:29:33, 1] lib/smbldap.c:another_ldap_try(951)
        Connection to LDAP server failed for the 6 try!
[2005/10/25 01:29:34, 2] smbd/sesssetup.c:setup_new_vc_session(704)
setup_new_vc_session: New VC == 0, if NT4.x compatible we would close
all old resources.
[2005/10/25 01:29:34, 2] smbd/sesssetup.c:setup_new_vc_session(704)
setup_new_vc_session: New VC == 0, if NT4.x compatible we would close
all old resources.
        [2005/10/25 01:29:34, 0] lib/smbldap.c:smbldap_open(822)
        smbldap_open: cannot access LDAP when not root..
        [2005/10/25 01:29:34, 1] lib/smbldap.c:another_ldap_try(951)
        Connection to LDAP server failed for the 7 try!



More information about the samba mailing list