[PATCH] provision: use ASCII quotes
douglas.bagnall at catalyst.net.nz
Wed Mar 13 10:44:24 UTC 2019
Philipp Gesang via samba-technical wrote:
>>> - input_file = open(input_file_name, "r")
>>> + input_file = io.open(input_file_name, "rt", encoding='utf8')
>> I had that first actually but then I tested all ldif files in the
>> tree and it turned out that only these two codepoints in a single
>> file were affected.
>> io.open() and open() are the same btw. and "t" mode is redundant.
They are *now*. Last week they weren't! That snippet is from those ancient
times when we theoretically supported 2.6.
But anyway, I am OK with editing extended-rights.ldif, because as you say
it has already been edited and is the only funny one there.
I had a look elsewhere with uchardet:
$ for x in $(git ls-files | grep -v /heimdal | grep -v third_party/ | grep -vP '\.tdb(\.dump)?$' | grep -vP '\.(reg|png|gpg|po|gz|keytab|SAMBABACKUP|dat)$' | grep -v CA-samba.example.com | grep -v examples ); do [ -f $x ] && [ $(uchardet $x) != ASCII ] && printf '%20s %s\n' $(uchardet $x); done | sort | uniq -c
the non-UTF-8s are mostly false positives in C files where people spell
their names correctly in copyright lines. They are really utf-8, but
(e.g.) ö decomposes into two iso-8859 chars if you look at it wrong and
The non-ASCII we parse is probably these:
and at least some of the detections are correct. The non-UTF-8 ones must already
have special handling. And meanwhile...
>> read_and_sub_file() is used in other contexts as well so I
>> triggered a CI run; let’s see what breaks ;)
I think we want that, because these files are data and we don't want to leave their
interpretation up to environment variables.
>>> If it does, I would prefer that.
>> Works for me.
Noel Power wrote:
> lgtm - on a side note Douglas we already talked about this before when
> we came across some similar issue and you did some code analysis on
> existing open vs io.open (under python2).
Yes. Perhaps I did. Obviously is io.open defaults to unicode,
and there is this:
but all those subtleties... who cares any more if py2 is gone?
More information about the samba-technical