[PATCH] provision: use ASCII quotes

Wed Mar 13 08:18:08 UTC 2019

Hi Douglas,

-<| Quoting Douglas Bagnall <douglas.bagnall at catalyst.net.nz>, on Wednesday, 2019-03-13 12:36:44 PM |>-
> hi Philipp,
>  
> > provisioning on a C locale breaks for me on devel master / commit
> > a68e8af2d1, running under Python 3. Error output see the bottom
> > of the mail. The problem is that open(…).read() trips over the
> > Unicode quotes in the license blurb of extended-rights.ldif.
> 
> Your patch is obviously good, but it is not quite perfect.
> 
> This exact string ("(“this documentation”)") has already caused the
> trouble in https://bugzilla.samba.org/show_bug.cgi?id=13826 where
> it refers to a file directly downloaded from Microsoft documentation.
> In that case we don't want edit the file. Instead we do something
> like this:
> 
> -    input_file = open(input_file_name, "r")
> +    input_file = io.open(input_file_name, "rt", encoding='utf8')

I had that first actually but then I tested all ldif files in the
tree and it turned out that only these two codepoints in a single
file were affected.

io.open() and open() are the same btw. and "t" mode is redundant.

> source4/setup/extended-rights.ldif doesn't look to be directly
> downloaded or machine generated, but it looks *close*, and there might
> be others we haven't found yet.

According to git log some hand-editing took place.

> So, my question is: does adding "encoding='utf8'" in the right place
> in read_and_sub_file() also solve the problem?

It does, see attached patch.

read_and_sub_file() is used in other contexts as well so I
triggered a CI run; let’s see what breaks ;)
https://gitlab.com/samba-team/devel/samba/pipelines/51577845

> If it does, I would prefer that.

Works for me.

Philipp

-------------- next part --------------
From eeba60db402aae81dda291f72269e3834075e4d1 Mon Sep 17 00:00:00 2001
From: Philipp Gesang <philipp.gesang at intra2net.com>
Date: Tue, 12 Mar 2019 15:43:42 +0100
Subject: [PATCH] python/samba: ignore encoding errors while reading files

Provisioning fails on C locale due to the Unicode quotes in ldif
data. Patch read_and_sub_file() to read the files as UTF-8.

Signed-off-by: Philipp Gesang <philipp.gesang at intra2net.com>
---
 python/samba/__init__.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/python/samba/__init__.py b/python/samba/__init__.py
index 93240dddfbb..d851bf3606c 100644
--- a/python/samba/__init__.py
+++ b/python/samba/__init__.py
@@ -280,7 +280,7 @@ def read_and_sub_file(file_name, subst_vars):
     :param file_name: File to be read (typically from setup directory)
      param subst_vars: Optional variables to subsitute in the file.
     """
-    data = open(file_name, 'r').read()
+    data = open(file_name, 'r', encoding="utf-8").read()
     if subst_vars is not None:
         data = substitute_var(data, subst_vars)
         check_all_substituted(data)
-- 
2.20.1

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20190313/f51673c4/signature.sig>