Porting Samba's CPython extensions to Python 3
pviktori at redhat.com
Mon Sep 7 11:43:19 UTC 2015
On 09/04/2015 05:04 AM, Andrew Bartlett wrote:
> On Fri, 2015-08-28 at 12:57 +0200, Petr Viktorin wrote:
>> Sorry for this long mail: a lot has happened since the last
>> and I need to refresh some points buried in the e-mail thread here:
>> In previous discussions, we agreed on a strategy for porting Samba to
>> Python 3. the stand-alone libraries would get a supported Python 3
>> Patches for the rest of Samba would be tolerated if they do not
>> inconvenience other developers, and they would be unsupported (if it
>> breaks, it's on whomever cares about Python 3 to fix it).
>> With the patches for the last stand-alone library reviewed, I think
>> time to revive that discussion, to get a better idea of how porting
>> Samba to Python 3 should work.
>> Specifically, I'd like to come to understand what would least
>> inconvenience you, while allowing some kind of progress on this
>> In the mentioned thread, there is an idea that there is no rush to
>> – Python 2 will be around for another five years.
>> But, while five years is a lot of time, if we spend time waiting
>> *will* be a rush later. I'm trying to avoid that. If five years is an
>> absolute deadline for porting to py3, testing, and removing support
>> py2, I think it does make sense to start.
>> In particular, waiting until enterprise Linux distributions switch to
>> Python 3 creates a Catch-22 that would most likely result in everyone
>> waiting till the last possible moment, and then rushing wildly. Like
>> Samba, a distribution wants to switch all at once; but to do that the
>> code must be ready.
> While I'm not as much convinced about the timing from that perspective,
> I am convinced that I would rather keep working with you than start
> this again in a few years.
> I really do appreciate your patience and dedication to handling this
> difficult area.
Thanks. I appreciate your willingness to merge patches that may not
bring results in the short term.
>> Moving from the "when" to the "how":
>> Generally, there is opposition against a bespoke compatibility layer,
>> which could not be tested well and would not get much use beyond
>> As with any code written by one external developer, if I got hit by a
>> bus, the compatibility layer could bitrot.
>> However, *some* kind of a compatibility layer is needed.
>> The string type in Python 2 was split to "bytes" and "unicode", and
>> there is a need to either differentiate these two, or use unicode
>> everywhere in Python 2 (which would change the semantics of the
>> Python 2
>> version, which is not practical for a project of Samba's size).
>> So, my approach is to differentiate between three kinds of strings:
>> - bytes (PyBytes; called "str" in py2, "bytes" in py3)
>> - native ("PyStr"; UTF-8 encoded "str" in py2; "str" in py3)
>> - text ("PyUnicode"; called "unicode" in py3, "str" in py3)
>> This string split is *the* difficult part of porting C extensions.
>> Compared to this, other decisions are fairly trivial: either use the
>> spelling or the py3 spelling of the same thing, and choose a point on
>> the spectrum between shared macros or inline #ifdefs.
>> Correspondingly, aside from the bytes/text split, the rest of the
>> porting process is largely mechanical.
> I don't understand why we need the PyStr_FromString macros however,
> given we didn't need them for Ldb?
But we did – they're added at the top of pyldb.c:
Some PyString are ported to PyBytes, some to PyStr, depending on what
they should be in Python 3.
>> The ideal solution for Samba would be if a compatibility layer was
>> distributed with Python itself. Unfortunately, this can't really
>> no features are added to Python 2.7 any more, and even if they were,
>> they couldn't be present in older 2.7 releases.
>> Realistically, I see three options for Samba, if it decides to start
>> 1) Include relevant macros in the files that need them. This is used
>> the stand-alone libraries (which typically have one Python module
>> This makes the code clear to anyone who knows C-API for Python 2 or
>> but when adding new macros it requires some care to have consistency.
>> 2) Put all compatibility macros in a shared header. This obscures the
>> code somewhat, with an additional header to know about, but ensures
>> the set of macros is the same throughout the project, and allows
>> documenting them fairly easily.
> I think we have to do this, or 3). We have a strong preference against
> duplicated functions and macros.
>> 3) Use a third-party library for the compatibility macros. This way,
>> compat layer can be shared with other projects; it also makes it
>> to keep it tested and documented.
> We would prefer that, and we can import that as a third_party codebase.
OK. I will work on integrating py3c into Samba, along with continuing to
promote it across the Python ecosystem.
>> Regardless of which option is chosen, I have a pretty good idea about
>> what a compatibility layer would look like.
>> I have written a tested, documented library called py3c  that
>> contains all the necessary macros. To encure consistency, this is
>> I've been pulling macros from when porting the stand-alone libraries.
>> The library is not officially recognized by Python upstream (their
>> suggestion nowadays would probably be to port to Cython or CFFI).
> Yies, that would be a bit change. Thanks for not suggesting that :-)
>> But, I
>> am in the process of absorbing parts of Python's C Porting Howto .
>> A superset of the macros I'd need for Samba are at:
>> The first part is specific to the porting strategy I use for Samba;
>> boils down to "use PyStr for native strings":
>> * PyStr_* maps to PyString_* or PyUnicode_*
>> * Python 2: PyBytes_* maps to PyString_*
>> (You can ignore the static function PyStr_Concat, this wart is not
>> needed for Samba.)
>> The rest emulates py2 or py3 API in the other Python.
>> (Unfortunately I can't use a single Python's API for both.)
>> * Python 3: PyInt_* maps to PyLong_*
>> * Module initialization uses the py3 syntax (except the function
>> declaration – "MODULE_INIT_FUNC(name)" instead of "static PyObject
> So, the reason for the PyStr_ stuff is to avoid having and accidental
> PyBytes -> PyString -> PyUnicode, either in the compiler or in
> someone's head?
I'm not sure if I understand the question correctly, but it looks like
that's one of the reasons.
A simple hard reason is that __str__/__repr__ functions need to return
the native string type, and Python itself has no universal spelling for
The alternatives to introducing a "new type" for native strings are:
- use unicode on both versions (which would change the semantics on
- use bytes on both versions (which would require using b'' everywhere
in Python 3).
>> I'm attaching draft patches that port "samba.netbios" using options 1
>> (inline macros) and 2 (shared header). (For the shared header,
>> additional buildsystem integration would be needed, and possibly a
>> better location for the header.)
> The PyStr stuff grates with me a bit, but I guess that's OK. Others
> may have stronger views however. It looks harmless enough.
>> Let me know if you have any thoughts on this matter. And, thank you
>> your continued patience.
> Thanks for your continued work on this.
> The key will be finding all the right places to deal with PyString_* in
> the generated headers, but PIDL knows what things are Unicode because
> it has a charset annotation.
Well, *finding* them is not hard, since PyString causes a compile-time
error on py3. The key is figuring out what to do with them.
I'm slowly progressing on a proof of concept.
More information about the samba-technical