[Samba] Winbindd has still bottlenecks when used with interdomain trusts.

Wed Feb 28 22:55:00 GMT 2007

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Good evening Volker,

great to get an answer from YOU!

>
> Yes. The async stuff in Winbind is to enable parallel
> operations to different trusted domains. Each domain is
> serialized still.
>
I understand. Hm, that's bad for me..
> So the login as such (the SamLogon call verifying the user's
> pw) is not your problem, it's the sid2gid calls that follow?
>
> Hmm. Why does a trusting call the central DC for these? For
> the SamLogon calls yes, but the sid2gid stuff? Or do you
> mean sid2name? There's something I don't get here.
>
That's the "patch" for samba 3.0.14a:

- --- ./source/sam/idmap_ldap.c   2005-03-11 14:47:05.000000000 +0100
+++ ../samba-3.0.14a-p2/source/sam/idmap_ldap.c 2005-11-28
23:41:15.000000000 +0100
@@ -497,7 +497,8 @@
        if ( id_type & ID_USERID )
                type = get_attr_key2string( idpool_attr_list,
LDAP_ATTR_UIDNUMBER );
        else
- -               type = get_attr_key2string( idpool_attr_list,
LDAP_ATTR_GIDNUMBER );
+               return ret;
+               //type = get_attr_key2string( idpool_attr_list,
LDAP_ATTR_GIDNUMBER );

        pstrcpy( suffix, lp_ldap_idmap_suffix() );
        pstr_sprintf(filter, "(&(objectClass=%s)(%s=%d))",


As you see, it was a workaround for samba before you implemented the new
winbinds, just to speed up everything to unblock winbind as fast as
possible.

With samba 3.0.24 the problem may be that we have only one worker child
for a domain as I have just figured out in winbindd_sid.c:idmap_child(void).

To illustrate the actual problem, I just added a debug statement to


winbindd_dual.c:schedule_async_request():


static void schedule_async_request(struct winbindd_child *child)

{

    struct winbindd_async_request *request = child->requests;


    if (request == NULL) {

        return;

    }


    if (child->event.flags != 0) {

        DEBUG(0, ("BUSY!!\n")); <-- DEBUG STATEMENT

        return;        /* Busy */

    }


    if ((child->pid == 0) && (!fork_domain_child(child))) {

        /* Cancel all outstanding requests */


        while (request != NULL) {

            /* request might be free'd in the continuation */

            struct winbindd_async_request *next = request->next;

            request->continuation(request->private_data, False);

            request = next;

        }

        return;

    }


    setup_async_write(&child->event, request->request,

              sizeof(*request->request),

              async_main_request_sent, request);


    talloc_destroy(child->mem_ctx);

    return;

}


Then, when I "simulate" some logins via:

#!/bin/sh

max=10

i=1

while (true); do

echo "XP logon # $i"
#The netlogon
echo quit | smbclient -U "CENTRAL\strack%pw" //dptdpdc/netlogon &
#The profiles
echo quit | smbclient -U "CENTRAL\strack%pw" //dptdpdc/profiles &
#another netlogon
echo quit | smbclient -U "CENTRAL\strack%pw" //dptdpdc/netlogon &
#a common share
echo quit | smbclient -U "CENTRAL\strack%pw" //dptdpdc/ati &

if [ $i -gt $max ]; then

        echo "Performed $i XP logins"

        sleep 1

        exit

fi

i=`expr $i + 1`

done


I get the following output in dptdpdc's logfile:

[2007/02/28 23:36:04, 0]
nsswitch/winbindd_dual.c:schedule_async_request(220)

  BUSY!!

[2007/02/28 23:36:04, 0]
nsswitch/winbindd_dual.c:schedule_async_request(220)

  BUSY!!

[2007/02/28 23:36:05, 0]
nsswitch/winbindd_dual.c:schedule_async_request(220)

  BUSY!!

[2007/02/28 23:36:05, 0]
nsswitch/winbindd_dual.c:schedule_async_request(220)

  BUSY!!

[2007/02/28 23:36:05, 0]
nsswitch/winbindd_dual.c:schedule_async_request(220)

  BUSY!!

[2007/02/28 23:36:06, 0]
nsswitch/winbindd_dual.c:schedule_async_request(220)

  BUSY!!

[2007/02/28 23:36:06, 0]
nsswitch/winbindd_dual.c:schedule_async_request(220)

  BUSY!!

[2007/02/28 23:36:06, 0]
nsswitch/winbindd_dual.c:schedule_async_request(220)

  BUSY!!


And some smbclient calls are showing timeouts:

tree connect failed: Call timed out: server did not respond after 20000
milliseconds

tree connect failed: Call timed out: server did not respond after 20000
milliseconds


The worker is simple too busy... Is there a possibility to fork multiple
workers
for a domain trust? E.g. sth. like this in

winbindd_sid.c:


#define __max_idmap_childs 50

static struct winbindd_child static_idmap_child;

//static struct winbindd_child static_idmap_child[__max_idmap_childs];

static int winbindd_idmap_child_index=0;


void init_idmap_child(void)

{

    int i=0;

    for (i=0; i<__max_idmap_childs; i++) {

        DEBUG(0, ("Setting up domainchild %d\n",i));

        setup_domain_child(NULL, &static_idmap_child[i], "idmap");

    }

   // setup_domain_child(NULL, &static_idmap_child, "idmap");

}


struct winbindd_child *idmap_child(void)

{

    DEBUG(0, ("RETURNING WORKER CHILD  %d\n",winbindd_idmap_child_index));

    return &static_idmap_child[winbindd_idmap_child_index++ %
__max_idmap_childs];

    //return &static_idmap_child;

}

I only had a 15 minutes look at the code and I know the idea here is
stupid, but you may propose me sth. how to implement multiple workers
correctly (pool like)? This way a do neither get multiple
cennections to the central PDC...

Thank you very much for your interest!

Harald

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)

iD8DBQFF5ghDczpSApoeLSQRAkEIAKCBhTkkVkIdEAwXZf2u2Jc4KCMcqwCfferq
BdkTvc9qkbaYEzOVDgFv8jE=
=9skf
-----END PGP SIGNATURE-----