Porting Samba's CPython extensions to Python 3
Andrew Bartlett
abartlet at samba.org
Fri Sep 4 03:04:30 UTC 2015
On Fri, 2015-08-28 at 12:57 +0200, Petr Viktorin wrote:
> Hello,
> Sorry for this long mail: a lot has happened since the last
> discussions,
> and I need to refresh some points buried in the e-mail thread here:
> https://lists.samba.org/archive/samba-technical/2015
> -March/106177.html
>
>
> In previous discussions, we agreed on a strategy for porting Samba to
> Python 3. the stand-alone libraries would get a supported Python 3
> port.
> Patches for the rest of Samba would be tolerated if they do not
> inconvenience other developers, and they would be unsupported (if it
> breaks, it's on whomever cares about Python 3 to fix it).
>
> With the patches for the last stand-alone library reviewed, I think
> it's
> time to revive that discussion, to get a better idea of how porting
> Samba to Python 3 should work.
> Specifically, I'd like to come to understand what would least
> inconvenience you, while allowing some kind of progress on this
> front.
>
> In the mentioned thread, there is an idea that there is no rush to
> port
> – Python 2 will be around for another five years.
> But, while five years is a lot of time, if we spend time waiting
> there
> *will* be a rush later. I'm trying to avoid that. If five years is an
> absolute deadline for porting to py3, testing, and removing support
> for
> py2, I think it does make sense to start.
> In particular, waiting until enterprise Linux distributions switch to
> Python 3 creates a Catch-22 that would most likely result in everyone
> waiting till the last possible moment, and then rushing wildly. Like
> Samba, a distribution wants to switch all at once; but to do that the
> code must be ready.
While I'm not as much convinced about the timing from that perspective,
I am convinced that I would rather keep working with you than start
this again in a few years.
I really do appreciate your patience and dedication to handling this
difficult area.
> Moving from the "when" to the "how":
>
> Generally, there is opposition against a bespoke compatibility layer,
> which could not be tested well and would not get much use beyond
> Samba.
> As with any code written by one external developer, if I got hit by a
> bus, the compatibility layer could bitrot.
>
> However, *some* kind of a compatibility layer is needed.
> The string type in Python 2 was split to "bytes" and "unicode", and
> there is a need to either differentiate these two, or use unicode
> everywhere in Python 2 (which would change the semantics of the
> Python 2
> version, which is not practical for a project of Samba's size).
> So, my approach is to differentiate between three kinds of strings:
> - bytes (PyBytes; called "str" in py2, "bytes" in py3)
> - native ("PyStr"; UTF-8 encoded "str" in py2; "str" in py3)
> - text ("PyUnicode"; called "unicode" in py3, "str" in py3)
> This string split is *the* difficult part of porting C extensions.
> Compared to this, other decisions are fairly trivial: either use the
> py2
> spelling or the py3 spelling of the same thing, and choose a point on
> the spectrum between shared macros or inline #ifdefs.
> Correspondingly, aside from the bytes/text split, the rest of the
> porting process is largely mechanical.
I don't understand why we need the PyStr_FromString macros however,
given we didn't need them for Ldb?
> The ideal solution for Samba would be if a compatibility layer was
> distributed with Python itself. Unfortunately, this can't really
> work:
> no features are added to Python 2.7 any more, and even if they were,
> they couldn't be present in older 2.7 releases.
>
> Realistically, I see three options for Samba, if it decides to start
> porting:
>
> 1) Include relevant macros in the files that need them. This is used
> in
> the stand-alone libraries (which typically have one Python module
> each).
> This makes the code clear to anyone who knows C-API for Python 2 or
> 3;
> but when adding new macros it requires some care to have consistency.
>
> 2) Put all compatibility macros in a shared header. This obscures the
> code somewhat, with an additional header to know about, but ensures
> that
> the set of macros is the same throughout the project, and allows
> documenting them fairly easily.
I think we have to do this, or 3). We have a strong preference against
duplicated functions and macros.
> 3) Use a third-party library for the compatibility macros. This way,
> the
> compat layer can be shared with other projects; it also makes it
> easier
> to keep it tested and documented.
We would prefer that, and we can import that as a third_party codebase.
> Regardless of which option is chosen, I have a pretty good idea about
> what a compatibility layer would look like.
> I have written a tested, documented library called py3c [0] that
> contains all the necessary macros. To encure consistency, this is
> where
> I've been pulling macros from when porting the stand-alone libraries.
> The library is not officially recognized by Python upstream (their
> first
> suggestion nowadays would probably be to port to Cython or CFFI).
Yies, that would be a bit change. Thanks for not suggesting that :-)
> But, I
> am in the process of absorbing parts of Python's C Porting Howto [1].
>
> A superset of the macros I'd need for Samba are at:
> https://github.com/encukou/py3c/blob/master/include/py3c/compat.h
>
> The first part is specific to the porting strategy I use for Samba;
> it
> boils down to "use PyStr for native strings":
>
> * PyStr_* maps to PyString_* or PyUnicode_*
> * Python 2: PyBytes_* maps to PyString_*
> (You can ignore the static function PyStr_Concat, this wart is not
> needed for Samba.)
>
> The rest emulates py2 or py3 API in the other Python.
> (Unfortunately I can't use a single Python's API for both.)
>
> * Python 3: PyInt_* maps to PyLong_*
> * Module initialization uses the py3 syntax (except the function
> declaration – "MODULE_INIT_FUNC(name)" instead of "static PyObject
> *PyInit_name(void)").
So, the reason for the PyStr_ stuff is to avoid having and accidental
PyBytes -> PyString -> PyUnicode, either in the compiler or in
someone's head?
> I have gone through Samba's C sxtensions and am reasonably sure this
> is
> a superset of the compat layer needed to port them all. (Two
> exceptions
> – PyFile_AsFile and PyCObject – are better dealt with individually.)
>
>
> I'm attaching draft patches that port "samba.netbios" using options 1
> (inline macros) and 2 (shared header). (For the shared header,
> additional buildsystem integration would be needed, and possibly a
> better location for the header.)
The PyStr stuff grates with me a bit, but I guess that's OK. Others
may have stronger views however. It looks harmless enough.
> Let me know if you have any thoughts on this matter. And, thank you
> for
> your continued patience.
Thanks for your continued work on this.
The key will be finding all the right places to deal with PyString_* in
the generated headers, but PIDL knows what things are Unicode because
it has a charset annotation.
> [0] http://py3c.readthedocs.org
> [1] http://bugs.python.org/issue24937
> [2] http://py3c.readthedocs.org/en/latest/defs.html
Thanks,
Andrew Bartlett
--
Andrew Bartlett
https://samba.org/~abartlet/
Authentication Developer, Samba Team https://samba.org
Samba Development and Support, Catalyst IT
https://catalyst.net.nz/services/samba
More information about the samba-technical
mailing list