LDB python3 strings

Andrew Bartlett abartlet at samba.org
Wed May 2 08:08:38 UTC 2018


On Wed, 2018-05-02 at 08:58 +0100, Noel Power wrote:
> Hi
> On 01/05/18 22:28, Andrew Bartlett via samba-technical wrote:
> > G'Day Noel,
> > 
> > Thanks so much for continuing the python3 work.  This is really
> > important and I'm so glad to be able to pass on the baton here.
> 
> Well I hope I am not going to be alone in working on this and I hope
> everyone who was also contributing will still do so, I don't really have
> the background knowledge (or even python skills) but I'm happy to keep
> pushing on as best and as hard as I can

OK.  I had hoped from the enthusiasm that you might have had a bit more
background, but I still appreciate the efforts.  This has totally
exhausted too many engineers so far!

> > 
> > One thing that came up in a discussion in the Catalyst office regarding
> > this work is worth raising more broadly.
> > 
> > It is exceedingly common in Samba's use of ldb to use:
> > 
> > username = str(res[0]["samAccountName"])
> > 
> > This works because of 
> > 
> > static PyObject *py_ldb_msg_element_str(PyLdbMessageElementObject *self)
> > {
> >         struct ldb_message_element *el = pyldb_MessageElement_AsMessageElement(self);
> > 
> >         if (el->num_values == 1)
> >                 return PyStr_FromStringAndSize((char *)el->values[0].data, el->values[0].length);
> >         else
> >                 Py_RETURN_NONE;
> > }
> 
> Not always :-/ It seems some attributes are not strings e.g. guids can
> be binary also same for security descriptors. These can fail with
> str(res[0]["blah"]) as there could easily be a decode error before even
> the py c code returns (I've even had to deal with this in my WIP)

Sure, I was kind of happy for that to give an error, as that is just
programmer error. 

> > However equally common is:
> > 
> > username = str(res[0]["samAccountName"][0])
> 
> probably more common is just the plain res[0]["samAccountName"][0] the
> str doesn't do anything in this case I think and the majority of the
> code I have seen doesn't enclose the value in the 'str' function

yes, but then we have to add a decode(), right?

> > 
> > This works because in python2 it just returns the string.  However in
> > python3 I'm told it will return "b'username'" (no so helpful).
> > 
> > As all strings in LDAP are UTF8 (I'm willing to assert that for sanity)
> > I think we need the MessageElement to contain not byte buffers, but a
> > subclass of byte buffers that have a string function that converts
> > automatically produces a utf8 string for str().
> 
> not sure exactly what you mean here because doesn't decode provide the
> same functionality?
>    e.g. res[0]["samAccountName"][0].decode('utf8')

Yes, but that means changing a lot of code.

> or do you mean change the api so that 'res[0]["samAccountName"][0]' will
> now return an object that provides a 'str' method *and* additionally
> some sort or a 'to_bytes' [1] type method this would mean we would have
> to modify
> 
> -  res[0]["blah"][0]'
> +  str(res[0]["blah"][0])'
> 
> with the exception of those attributes that we require binary content
> for where they would have to
> 
> -  res[0]['binaryAttr'][0]
> + res[0]['binaryAttr'][0].to_bytes()'
> 
> However there doesn't seem really to be much difference in effort here
> than just adding the decode where necessary like
> 
> -  res[0]['blah'][0]
> + res[0]['blah][0].decode('uft8')
> 
> Now I readily admit I am not really a python programmer nor have really
> a huge amount of knowledge of the samba python api so I guess I am
> missing something ?

I was sort of hoping it would be some kind of weird polymorphic thing
that behaved like a string or bytes in the same way python2 did given
we know it is utf8 if string-ish. 

> Also if anyone has an easy list of what attributes definitely have
> binary content that would be useful

I don't think we can assert that, but there are conventions. 

> > 
> > Do you think you could have a look at that?  Otherwise, converting
> > samba-tool and our other ldb-calling code is going to get very tricky.
> 
> yep, I am already experiencing that, I've already converted a hunk of
> the samba_tool tests (those exercising the api) to python3 (you can see
> the progress https://github.com/samba-team/samba/pull/161 - please note,
> this is a WIP branch, there's only a pull request for visibility and CI
> exposure) The string/binary issue around attributes is annoying. I'd
> welcome any more input, suggestions or other possible solution there.

OK, I'll try and find some time and I'll ask Joe to keep up looking at
this, he has the strong python background that is critical here.

Thanks!

Andrew Bartlett

> Noel
> 
> [1] I expected python3 to provide a 'tp_bytes' type c-function hook,
> afaik in native python you can define a '__bytes__' method. However this
> doesn't seem to be the case.
> 
-- 
Andrew Bartlett                       http://samba.org/~abartlet/
Authentication Developer, Samba Team  http://samba.org
Samba Developer, Catalyst IT          http://catalyst.net.nz/services/samba




More information about the samba-technical mailing list