LDB python3 strings

Noel Power nopower at suse.com
Wed May 2 12:01:30 UTC 2018

On 02/05/18 09:08, Andrew Bartlett wrote:
> On Wed, 2018-05-02 at 08:58 +0100, Noel Power wrote:
>> Hi
>> On 01/05/18 22:28, Andrew Bartlett via samba-technical wrote:
>>> G'Day Noel,
>>> Thanks so much for continuing the python3 work.  This is really
>>> important and I'm so glad to be able to pass on the baton here.
>> Well I hope I am not going to be alone in working on this and I hope
>> everyone who was also contributing will still do so, I don't really have
>> the background knowledge (or even python skills) but I'm happy to keep
>> pushing on as best and as hard as I can
> OK.  I had hoped from the enthusiasm that you might have had a bit more
> background, but I still appreciate the efforts.  This has totally
> exhausted too many engineers so far!
I think with some 'enthusiasm', communication, cooperation + review we
can achieve alot :-) It's pretty hard to know everything required for
this, python experts typically know little about samba, even people who
know samba may not know the python apis :-). I know just a little about
alot ;-) I'm hoping in this case that it is more useful than knowing
alot about little.
>>> One thing that came up in a discussion in the Catalyst office regarding
>>> this work is worth raising more broadly.
>>> It is exceedingly common in Samba's use of ldb to use:
>>> username = str(res[0]["samAccountName"])
>>> This works because of 
>>> static PyObject *py_ldb_msg_element_str(PyLdbMessageElementObject *self)
>>> {
>>>         struct ldb_message_element *el = pyldb_MessageElement_AsMessageElement(self);
>>>         if (el->num_values == 1)
>>>                 return PyStr_FromStringAndSize((char *)el->values[0].data, el->values[0].length);
>>>         else
>>>                 Py_RETURN_NONE;
>>> }
>> Not always :-/ It seems some attributes are not strings e.g. guids can
>> be binary also same for security descriptors. These can fail with
>> str(res[0]["blah"]) as there could easily be a decode error before even
>> the py c code returns (I've even had to deal with this in my WIP)
>>> This works because in python2 it just returns the string.  However in
>>> python3 I'm told it will return "b'username'" (no so helpful).
>>> As all strings in LDAP are UTF8 (I'm willing to assert that for sanity)
>>> I think we need the MessageElement to contain not byte buffers, but a
>>> subclass of byte buffers that have a string function that converts
>>> automatically produces a utf8 string for str().
>> not sure exactly what you mean here because doesn't decode provide the
>> same functionality?
>>    e.g. res[0]["samAccountName"][0].decode('utf8')
> Yes, but that means changing a lot of code.
I am not sure that can be avoided, it might not be a bad thing for the
code to be changed so that string/byte usage is correct and unambiguous
(using the correct encoding/decoding etc).
>> or do you mean change the api so that 'res[0]["samAccountName"][0]' will
>> now return an object that provides a 'str' method *and* additionally
>> some sort or a 'to_bytes' [1] type method this would mean we would have
>> to modify
>> -  res[0]["blah"][0]'
>> +  str(res[0]["blah"][0])'
>> with the exception of those attributes that we require binary content
>> for where they would have to
>> -  res[0]['binaryAttr'][0]
>> + res[0]['binaryAttr'][0].to_bytes()'
>> However there doesn't seem really to be much difference in effort here
>> than just adding the decode where necessary like
>> -  res[0]['blah'][0]
>> + res[0]['blah][0].decode('uft8')
>> Now I readily admit I am not really a python programmer nor have really
>> a huge amount of knowledge of the samba python api so I guess I am
>> missing something ?
> I was sort of hoping it would be some kind of weird polymorphic thing
> that behaved like a string or bytes in the same way python2 did given
> we know it is utf8 if string-ish. 
I don't see how we can get away with the fact that in python3 there are
hard code boundaries with various api's (native python & samba) that
expect either 'bytes' or a 'str'. The decision needs to be made before
interacting with these apis to provide the correct type. I don't see how
this can be done automagically (but if someone can point to a way that
can be done I am all ears)
>> Also if anyone has an easy list of what attributes definitely have
>> binary content that would be useful
> I don't think we can assert that, but there are conventions.
Assuming code needs to be manually adjusted this information would be
useful in order to identify those places that need to be changed
>>> Do you think you could have a look at that?  Otherwise, converting
>>> samba-tool and our other ldb-calling code is going to get very tricky.
>> yep, I am already experiencing that, I've already converted a hunk of
>> the samba_tool tests (those exercising the api) to python3 (you can see
>> the progress https://github.com/samba-team/samba/pull/161 - please note,
>> this is a WIP branch, there's only a pull request for visibility and CI
>> exposure) The string/binary issue around attributes is annoying. I'd
>> welcome any more input, suggestions or other possible solution there.
I have to stress that this is WIP, typically I am trying to work through
the tests, get them working (under CI) and then come back to revaluate
the changes needed, try and identify generic changes and then
reorganise/rebase + pull groups of these changes out for review in
separate pull requests/mails to the list.
> OK, I'll try and find some time and I'll ask Joe to keep up looking at
> this, he has the strong python background that is critical here.
> Thanks!
> Andrew Bartlett
thanks again

More information about the samba-technical mailing list