Porting Samba's CPython extensions to Python 3

Fri Sep 4 03:04:30 UTC 2015

On Fri, 2015-08-28 at 12:57 +0200, Petr Viktorin wrote:
> Hello,
> Sorry for this long mail: a lot has happened since the last 
> discussions,
> and I need to refresh some points buried in the e-mail thread here:
> https://lists.samba.org/archive/samba-technical/2015
> -March/106177.html
> 
> 
> In previous discussions, we agreed on a strategy for porting Samba to
> Python 3. the stand-alone libraries would get a supported Python 3 
> port.
> Patches for the rest of Samba would be tolerated if they do not
> inconvenience other developers, and they would be unsupported (if it
> breaks, it's on whomever cares about Python 3 to fix it).
> 
> With the patches for the last stand-alone library reviewed, I think 
> it's
> time to revive that discussion, to get a better idea of how porting
> Samba to Python 3 should work.
> Specifically, I'd like to come to understand what would least
> inconvenience you, while allowing some kind of progress on this 
> front.
> 
> In the mentioned thread, there is an idea that there is no rush to 
> port
> – Python 2 will be around for another five years.
> But, while five years is a lot of time, if we spend time waiting 
> there
> *will* be a rush later. I'm trying to avoid that. If five years is an
> absolute deadline for porting to py3, testing, and removing support 
> for
> py2, I think it does make sense to start.
> In particular, waiting until enterprise Linux distributions switch to
> Python 3 creates a Catch-22 that would most likely result in everyone
> waiting till the last possible moment, and then rushing wildly. Like
> Samba, a distribution wants to switch all at once; but to do that the
> code must be ready.

While I'm not as much convinced about the timing from that perspective,
I am convinced that I would rather keep working with you than start
this again in a few years.  

I really do appreciate your patience and dedication to handling this
difficult area.

> Moving from the "when" to the "how":
> 
> Generally, there is opposition against a bespoke compatibility layer,
> which could not be tested well and would not get much use beyond 
> Samba.
> As with any code written by one external developer, if I got hit by a
> bus, the compatibility layer could bitrot.
> 
> However, *some* kind of a compatibility layer is needed.
> The string type in Python 2 was split to "bytes" and "unicode", and
> there is a need to either differentiate these two, or use unicode
> everywhere in Python 2 (which would change the semantics of the 
> Python 2
> version, which is not practical for a project of Samba's size).
> So, my approach is to differentiate between three kinds of strings:
> - bytes (PyBytes; called "str" in py2, "bytes" in py3)
> - native ("PyStr"; UTF-8 encoded "str" in py2; "str" in py3)
> - text ("PyUnicode"; called "unicode" in py3, "str" in py3)
> This string split is *the* difficult part of porting C extensions.
> Compared to this, other decisions are fairly trivial: either use the 
> py2
> spelling or the py3 spelling of the same thing, and choose a point on
> the spectrum between shared macros or inline #ifdefs.
> Correspondingly, aside from the bytes/text split, the rest of the
> porting process is largely mechanical.

I don't understand why we need the PyStr_FromString macros however,
given we didn't need them for Ldb?

> The ideal solution for Samba would be if a compatibility layer was
> distributed with Python itself. Unfortunately, this can't really 
> work:
> no features are added to Python 2.7 any more, and even if they were,
> they couldn't be present in older 2.7 releases.
> 
> Realistically, I see three options for Samba, if it decides to start
> porting:
> 
> 1) Include relevant macros in the files that need them. This is used 
> in
> the stand-alone libraries (which typically have one Python module 
> each).
> This makes the code clear to anyone who knows C-API for Python 2 or 
> 3;
> but when adding new macros it requires some care to have consistency.
> 
> 2) Put all compatibility macros in a shared header. This obscures the
> code somewhat, with an additional header to know about, but ensures 
> that
> the set of macros is the same throughout the project, and allows
> documenting them fairly easily.

I think we have to do this, or 3).  We have a strong preference against
duplicated functions and macros. 

> 3) Use a third-party library for the compatibility macros. This way, 
> the
> compat layer can be shared with other projects; it also makes it 
> easier
> to keep it tested and documented.

We would prefer that, and we can import that as a third_party codebase.

> Regardless of which option is chosen, I have a pretty good idea about
> what a compatibility layer would look like.
> I have written a tested, documented library called py3c [0] that
> contains all the necessary macros. To encure consistency, this is 
> where
> I've been pulling macros from when porting the stand-alone libraries.
> The library is not officially recognized by Python upstream (their 
> first
> suggestion nowadays would probably be to port to Cython or CFFI). 

Yies, that would be a bit change.  Thanks for not suggesting that :-)

> But, I
> am in the process of absorbing parts of Python's C Porting Howto [1].
> 
> A superset of the macros I'd need for Samba are at:
> https://github.com/encukou/py3c/blob/master/include/py3c/compat.h
> 
> The first part is specific to the porting strategy I use for Samba; 
> it
> boils down to "use PyStr for native strings":
> 
> * PyStr_* maps to PyString_* or PyUnicode_*
> * Python 2: PyBytes_* maps to PyString_*
> (You can ignore the static function PyStr_Concat, this wart is not
> needed for Samba.)
> 
> The rest emulates py2 or py3 API in the other Python.
> (Unfortunately I can't use a single Python's API for both.)
> 
> * Python 3: PyInt_* maps to PyLong_*
> * Module initialization uses the py3 syntax (except the function
> declaration – "MODULE_INIT_FUNC(name)" instead of "static PyObject
> *PyInit_name(void)").

So, the reason for the PyStr_ stuff is to avoid having and accidental
PyBytes -> PyString -> PyUnicode, either in the compiler or in
someone's head?

> I have gone through Samba's C sxtensions and am reasonably sure this 
> is
> a superset of the compat layer needed to port them all. (Two 
> exceptions
> – PyFile_AsFile and PyCObject – are better dealt with individually.)
> 
> 
> I'm attaching draft patches that port "samba.netbios" using options 1
> (inline macros) and 2 (shared header). (For the shared header,
> additional buildsystem integration would be needed, and possibly a
> better location for the header.)

The PyStr stuff grates with me a bit, but I guess that's OK.  Others
may have stronger views however.  It looks harmless enough. 

> Let me know if you have any thoughts on this matter. And, thank you 
> for
> your continued patience.

Thanks for your continued work on this.

The key will be finding all the right places to deal with PyString_* in
the generated headers, but PIDL knows what things are Unicode because
it has a charset annotation. 

> [0] http://py3c.readthedocs.org
> [1] http://bugs.python.org/issue24937
> [2] http://py3c.readthedocs.org/en/latest/defs.html

Thanks,

Andrew Bartlett

-- 
Andrew Bartlett
https://samba.org/~abartlet/
Authentication Developer, Samba Team         https://samba.org
Samba Development and Support, Catalyst IT   
https://catalyst.net.nz/services/samba