LDB python3 strings

Wed May 2 07:58:51 UTC 2018

Hi
On 01/05/18 22:28, Andrew Bartlett via samba-technical wrote:
> G'Day Noel,
>
> Thanks so much for continuing the python3 work.  This is really
> important and I'm so glad to be able to pass on the baton here.
Well I hope I am not going to be alone in working on this and I hope
everyone who was also contributing will still do so, I don't really have
the background knowledge (or even python skills) but I'm happy to keep
pushing on as best and as hard as I can
>
> One thing that came up in a discussion in the Catalyst office regarding
> this work is worth raising more broadly.
>
> It is exceedingly common in Samba's use of ldb to use:
>
> username = str(res[0]["samAccountName"])
>
> This works because of 
>
> static PyObject *py_ldb_msg_element_str(PyLdbMessageElementObject *self)
> {
>         struct ldb_message_element *el = pyldb_MessageElement_AsMessageElement(self);
>
>         if (el->num_values == 1)
>                 return PyStr_FromStringAndSize((char *)el->values[0].data, el->values[0].length);
>         else
>                 Py_RETURN_NONE;
> }
Not always :-/ It seems some attributes are not strings e.g. guids can
be binary also same for security descriptors. These can fail with
str(res[0]["blah"]) as there could easily be a decode error before even
the py c code returns (I've even had to deal with this in my WIP)
>
> However equally common is:
>
> username = str(res[0]["samAccountName"][0])
probably more common is just the plain res[0]["samAccountName"][0] the
str doesn't do anything in this case I think and the majority of the
code I have seen doesn't enclose the value in the 'str' function
>
> This works because in python2 it just returns the string.  However in
> python3 I'm told it will return "b'username'" (no so helpful).
>
> As all strings in LDAP are UTF8 (I'm willing to assert that for sanity)
> I think we need the MessageElement to contain not byte buffers, but a
> subclass of byte buffers that have a string function that converts
> automatically produces a utf8 string for str().
not sure exactly what you mean here because doesn't decode provide the
same functionality?
   e.g. res[0]["samAccountName"][0].decode('utf8')

or do you mean change the api so that 'res[0]["samAccountName"][0]' will
now return an object that provides a 'str' method *and* additionally
some sort or a 'to_bytes' [1] type method this would mean we would have
to modify

-  res[0]["blah"][0]'
+  str(res[0]["blah"][0])'

with the exception of those attributes that we require binary content
for where they would have to

-  res[0]['binaryAttr'][0]
+ res[0]['binaryAttr'][0].to_bytes()'

However there doesn't seem really to be much difference in effort here
than just adding the decode where necessary like

-  res[0]['blah'][0]
+ res[0]['blah][0].decode('uft8')

Now I readily admit I am not really a python programmer nor have really
a huge amount of knowledge of the samba python api so I guess I am
missing something ?

Also if anyone has an easy list of what attributes definitely have
binary content that would be useful
>
> Do you think you could have a look at that?  Otherwise, converting
> samba-tool and our other ldb-calling code is going to get very tricky.
yep, I am already experiencing that, I've already converted a hunk of
the samba_tool tests (those exercising the api) to python3 (you can see
the progress https://github.com/samba-team/samba/pull/161 - please note,
this is a WIP branch, there's only a pull request for visibility and CI
exposure) The string/binary issue around attributes is annoying. I'd
welcome any more input, suggestions or other possible solution there.

Noel

[1] I expected python3 to provide a 'tp_bytes' type c-function hook,
afaik in native python you can define a '__bytes__' method. However this
doesn't seem to be the case.