[clug] Encoding

Kunshan Wang wks1986 at gmail.com
Thu Dec 31 09:23:34 UTC 2015


Make sure you use Python3. If you really really have to use Python2, I
recommend adding "from __future__ import unicode_literals" so that
literals like '8', 'abc', '\xae' and '\u00ae' have type "unicode" rather
than "str", just as if you typed u'8', u'abc', u'\xae' or u'\u00ae'. In
Python3, "str" is already unicode, and "bytes" is the type of 8-bit byte
sequences.

The following paragraphs assume Python3.

The default source file encoding of Python3 is utf8, so you can directly
type '®' or even '世界你好' in the source code, provided the source code
is encoded in utf8. You can (or if you use Python2, have to) also
explicitly declare the file encoding in the second line of a .py file in
the form:

# coding: utf8

or whatever character set you want. Python matches the comment against
the regular expression r'coding[=:]\s*([-\w.]+)'.

Further reading:
https://docs.python.org/3/reference/lexical_analysis.html#encoding-declarations

Of course you can always escape those characters like '\xae' or
'\u00ae'. In this escaped form, the number is the unicode code point,
not the bytes they are encoded with. For example, '®', '\xae' or
'\u00ze, when encoded in utf8, is b'\xc2\xae'; but when encoded in
latin1, is b'\xae' (a bytes object, not a str).

Kunshan

On 29/12/15 18:22, Adrian wrote:
> I have a python file which I wish to execute but it results in an error.
> The error is explicit but the solution is not obvious as to what to do.
> 
> Here it is:
> 
> adrian at adrian-TravelMate-6293:~/thought.treasure/python$ python tt.py
>   File "tt.py", line 144
> SyntaxError: Non-ASCII character '\xae' in file tt.py on line 144, but
> no encoding declared; seehttp://python.org/dev/peps/pep-0263/
> <http://python.org/dev/peps/pep-0263/>for details
> 
> 
> and here is the file tt,py:
> F_SUPERLATIVE       = '8'
> F_ELEMENT           = '9'
> F_TRADEMARK         = '<AE>'
> F_MODAL             = '<B5>'
> F_AMERICAN          = '<C0>'
> 
> 
> it is the line marked F-TRADEMARK   and probably those the follow.
> 
> Adrian
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 473 bytes
Desc: OpenPGP digital signature
URL: <http://lists.samba.org/pipermail/linux/attachments/20151231/4bd8fb15/signature.sig>


More information about the linux mailing list