LDB python3 strings

William Brown william at blackhats.net.au
Wed May 2 21:38:15 UTC 2018


On Wed, 2018-05-02 at 13:01 +0100, Noel Power via samba-technical
wrote:
> On 02/05/18 09:08, Andrew Bartlett wrote:
> > On Wed, 2018-05-02 at 08:58 +0100, Noel Power wrote:
> > > Hi
> > > On 01/05/18 22:28, Andrew Bartlett via samba-technical wrote:
> > > > G'Day Noel,
> > > > 
> > > > Thanks so much for continuing the python3 work.  This is really
> > > > important and I'm so glad to be able to pass on the baton here.
> > > 
> > > Well I hope I am not going to be alone in working on this and I
> > > hope
> > > everyone who was also contributing will still do so, I don't
> > > really have
> > > the background knowledge (or even python skills) but I'm happy to
> > > keep
> > > pushing on as best and as hard as I can
> > 
> > OK.  I had hoped from the enthusiasm that you might have had a bit
> > more
> > background, but I still appreciate the efforts.  This has totally
> > exhausted too many engineers so far!
> 
> I think with some 'enthusiasm', communication, cooperation + review
> we
> can achieve alot :-) It's pretty hard to know everything required for
> this, python experts typically know little about samba, even people
> who
> know samba may not know the python apis :-). I know just a little
> about
> alot ;-) I'm hoping in this case that it is more useful than knowing
> alot about little.

I've just come out of a python2 -> python3 work for an ldap library in
389 Directory Server and can agree it's a lot of work, but with some
"community spirit" you can achieve a lot indeed.

> > > > One thing that came up in a discussion in the Catalyst office
> > > > regarding
> > > > this work is worth raising more broadly.
> > > > 
> > > > It is exceedingly common in Samba's use of ldb to use:
> > > > 
> > > > username = str(res[0]["samAccountName"])
> > > > 
> > > > This works because of 
> > > > 
> > > > static PyObject
> > > > *py_ldb_msg_element_str(PyLdbMessageElementObject *self)
> > > > {
> > > >         struct ldb_message_element *el =
> > > > pyldb_MessageElement_AsMessageElement(self);
> > > > 
> > > >         if (el->num_values == 1)
> > > >                 return PyStr_FromStringAndSize((char *)el-
> > > > >values[0].data, el->values[0].length);
> > > >         else
> > > >                 Py_RETURN_NONE;
> > > > }
> > > 
> > > Not always :-/ It seems some attributes are not strings e.g.
> > > guids can
> > > be binary also same for security descriptors. These can fail with
> > > str(res[0]["blah"]) as there could easily be a decode error
> > > before even
> > > the py c code returns (I've even had to deal with this in my WIP)
> > 
> >  
> 
> [...]
> > > > This works because in python2 it just returns the
> > > > string.  However in
> > > > python3 I'm told it will return "b'username'" (no so helpful).
> > > > 
> > > > As all strings in LDAP are UTF8 (I'm willing to assert that for
> > > > sanity)
> > > > I think we need the MessageElement to contain not byte buffers,
> > > > but a
> > > > subclass of byte buffers that have a string function that
> > > > converts
> > > > automatically produces a utf8 string for str().
> > > 
> > > not sure exactly what you mean here because doesn't decode
> > > provide the
> > > same functionality?
> > >    e.g. res[0]["samAccountName"][0].decode('utf8')
> > 
> > Yes, but that means changing a lot of code.
> 
> I am not sure that can be avoided, it might not be a bad thing for
> the
> code to be changed so that string/byte usage is correct and
> unambiguous
> (using the correct encoding/decoding etc).

This is a really annoying one to solve. In the lib389 code instead of
putting 'decode' everywhere, we actually added some wrappers onto our
object management code that allowed you to get a type in the manner you
expect.

get_utf8
get_bytes
get_int

Samba 4 could really use:

get_ndr_unpacked()

Which would help to shortcut all the ndr_unpack calls too :) 

This was really helpful because there *are* cases where you want bytes,
utf8 or int - perhaps others.


So instead, you could change this to:

res[0].get_utf8('samaccountname')[0]

As an idea.


> > > or do you mean change the api so that
> > > 'res[0]["samAccountName"][0]' will
> > > now return an object that provides a 'str' method *and*
> > > additionally
> > > some sort or a 'to_bytes' [1] type method this would mean we
> > > would have
> > > to modify
> > > 
> > > -  res[0]["blah"][0]'
> > > +  str(res[0]["blah"][0])'
> > > 
> > > with the exception of those attributes that we require binary
> > > content
> > > for where they would have to
> > > 
> > > -  res[0]['binaryAttr'][0]
> > > + res[0]['binaryAttr'][0].to_bytes()'
> > > 
> > > However there doesn't seem really to be much difference in effort
> > > here
> > > than just adding the decode where necessary like
> > > 
> > > -  res[0]['blah'][0]
> > > + res[0]['blah][0].decode('uft8')
> > > 
> > > Now I readily admit I am not really a python programmer nor have
> > > really
> > > a huge amount of knowledge of the samba python api so I guess I
> > > am
> > > missing something ?
> > 
> > I was sort of hoping it would be some kind of weird polymorphic
> > thing
> > that behaved like a string or bytes in the same way python2 did
> > given
> > we know it is utf8 if string-ish. 
> 
> I don't see how we can get away with the fact that in python3 there
> are
> hard code boundaries with various api's (native python & samba) that
> expect either 'bytes' or a 'str'. The decision needs to be made
> before
> interacting with these apis to provide the correct type. I don't see
> how
> this can be done automagically (but if someone can point to a way
> that
> can be done I am all ears)

You can use mypy which is a strict type checker for python to help
enforce some of these at "compile time".

> > 
> > > Also if anyone has an easy list of what attributes definitely
> > > have
> > > binary content that would be useful
> > 
> > I don't think we can assert that, but there are conventions.
> 
> Assuming code needs to be manually adjusted this information would be
> useful in order to identify those places that need to be changed

Isn't this able to be determined from the schema ....

Anyway, as above: the best solution we found was just to have the
client request the "type it expected", and if it's wrong ... well,
should have asked for those bytes as an int not a string (for example).

> >  
> > 
> > > > Do you think you could have a look at that?  Otherwise,
> > > > converting
> > > > samba-tool and our other ldb-calling code is going to get very
> > > > tricky.
> > > 
> > > yep, I am already experiencing that, I've already converted a
> > > hunk of
> > > the samba_tool tests (those exercising the api) to python3 (you
> > > can see
> > > the progress https://github.com/samba-team/samba/pull/161 -
> > > please note,
> > > this is a WIP branch, there's only a pull request for visibility
> > > and CI
> > > exposure) The string/binary issue around attributes is annoying.
> > > I'd
> > > welcome any more input, suggestions or other possible solution
> > > there.
> 
> I have to stress that this is WIP, typically I am trying to work
> through
> the tests, get them working (under CI) and then come back to
> revaluate
> the changes needed, try and identify generic changes and then
> reorganise/rebase + pull groups of these changes out for review in
> separate pull requests/mails to the list.
> > OK, I'll try and find some time and I'll ask Joe to keep up looking
> > at
> > this, he has the strong python background that is critical here.

I'm happy to look at an review these also :) 

> > 
> > Thanks!
> > 
> > Andrew Bartlett
> 
> thanks again
> Noel
> 



More information about the samba-technical mailing list