LDB python3 strings

Tue May 8 22:33:14 UTC 2018

On 09/05/18 06:25, Noel Power via samba-technical wrote:
> On 02/05/18 22:38, William Brown wrote:
>> On Wed, 2018-05-02 at 13:01 +0100, Noel Power via samba-technical
>> wrote:
> [...]
>> This is a really annoying one to solve. In the lib389 code instead of
>> putting 'decode' everywhere, we actually added some wrappers onto our
>> object management code that allowed you to get a type in the manner you
>> expect.
>>
>> get_utf8
>> get_bytes
>> get_int
> Like Andrew said wrt. adding 'decode', this will  mean changing alot of
> code. My main concern is loosing the distinction between text and binary
> types in whatever changes to the api we provide.
> Also we are not just porting code to python3 we are porting the code to
> be python2/python3 compatible and this makes things a little trickier.
> 
>>
> [...]
>> So instead, you could change this to:
>>
>> res[0].get_utf8('samaccountname')[0]
> Andrew if the api was to change to provide (by default) utf8 strings for
>    res[0]["samAccountName"][0]
> 
> then by that I assume by that you mean that
> in python2
>    res[0]["samAccountName"][0] will now return 'unicode'
> in python3
>   res[0]["samAccountName"][0] will return 'str'
>  
> if so then I was thinking of something along the lines of providing a
> '.raw' accessor might be more appropriate for example
> 
>   res[0].raw('blah')[0]
> 
> where 'raw' would give access to type 'str' in python2 (like the old
> api) and in python3 would return 'bytes' like it does currently.
> 
> This approach provides some sort of distinction between binary/text data
> in both python2/python3 with hopefully minimal changes. It should
> prepare any python2 code appropriately. Of course it will break some
> things (unfortunately including some commits of mine recently pushed to
> master) but that is unfortunately unavoidable. I'd be hopeful that the
> true usage of 'binary' attribute values is quite limited so this change
> should only require modifications to access the 'raw' attibutes in just
> a few places and that in the long term this way is a better choice.
> 
> So, I think this is sortof where you were going ?, however... I'd really
> like to be sure that such a change to the api that we will have to live
> with is appropriate. Do others with experience in this area of this core
> api agree with such a change or see any problems? I don't feel I know
> enough about the ldb usage but I'm happy to do the change if this is
> felt to be the right way to go. In anycase I am going to trying to
> implement something like the above to see what kind of fallout there
> will be in the code.
> 
> 
> Noel
> 

What I thought Andrew meant is a C implementation of exactly this:

class LdbBytes(bytes):
     def __str__(self):
         return self.decode('utf8')

which would be bytes in all but stringification, as follows:

x = LdbBytes(b'abc')
str(x)               # 'abc'

b = b'abc'
str(b)               # "b'abc'"

x == b               # True
isinstance(x, bytes) # True

repr(x)              # "b'abc'"

It makes things very easy now, at the risk of causing other parties to
swear loudly in the future. Progress as usual in software.

It would be nice to be able to piggyback the Python 3 API break as an
excuse to breakingly improve our Python APIs, but we can't easily do
that while still supporting Py2.

Douglas