I'm learning Python and tried to make a hanging game (literal translation - don't know the real name of the game in English. Sorry.). For those who aren't familiar with this game, the player must discover a secret word by guessing one letter at a time.
In my code, I get a collection of secret words which is imported from a txt file using the following code:
words_bank = open('palavras.txt', 'r')
words = []
for line in words_bank:
words.append(line.strip().lower())
words_bank.close()
print(words)
The output of print(words)
is ['ma\xc3\xa7\xc3\xa3', 'a\xc3\xa7a\xc3\xad', 'tucum\xc3\xa3']
but if I try print('maçã, açaí, tucumã')
in order to check the special characters, everything is printed correctly. Looks like the issue is in the encoding (or decoding... I'm still reading lots of articles about it to really understand) special characters from files.
The content of line 1 of my code is # coding: utf-8
because after some research I found out that I have to specify the Unicode format that is required for the text to be encoded/decoded. Before adding it, I was receiving the following message when running the code:
File "path/forca.py", line 12
SyntaxError: Non-ASCII character '\xc3' in file path/forca.py on line 12, but no encoding declared
Line 12 content: print('maçã, açaí, tucumã')
Things that I've already tried:
encode='utf-8'
as parameter in open('palavras.txt', 'r')
decode='utf-8'
as parameter in open('palavras.txt', 'r')
#coding: latin1
My OS is Ubuntu 20.04 LTS, my IDE is VS Code. Nothing works! I don't know what search and what to do anymore.
SOLUTION HERE
Thanks to the help given by the friends above, I was able to find out that the real problem was in the combo VS Code extension (Code Runner) + python alias version from Ubuntu 20.04 LTS.
Code Runner is set to run codes in Terminal in my situation, so apparently, when it calls for python the alias version was python 2.7.x
. To overcome this situation I've used this thread to set python 3 as default.
It's done! Whenever python is called, both in terminal and VS Code with Code Runner, all special characters works just fine.
Thank's everybody for your time and your help =)
This only happens when using Python 2.x.
The error is probably because you're printing a list not printing items in the list.
When calling print(words)
(words
is a list), Python invokes a special function called repr
on the list object. The list then creates a summary representation of the list by calling repr
in each child in the list, then creates a neat string visualisation.
repr(string)
actually returns an ASCII representation (with escapes) rather than a suitable version for your terminal.
Instead, try:
for x in words:
print(x)
Note. The option for open
is encoding
. E.g
open('myfile.txt', encoding='utf-8')
You should always, always pass the encoding
option to open - Python <=3.8 on Linux and Mac will assume UTF-8 (for most people). Python <=3.8 on Windows will use an 8-bit code page.
Python 3.9 will always use UTF-8
See Python 2.x vs 3.x behaviour:
Py2
>>> print ['maçã', 'açaí', 'tucumã']
['ma\xc3\xa7\xc3\xa3', 'a\xc3\xa7a\xc3\xad', 'tucum\xc3\xa3']
>>> repr('maçã')
"'ma\\xc3\\xa7\\xc3\\xa3'"
>>> print 'maçã'
maçã
Py3
>>> print(['maçã', 'açaí', 'tucumã'])
['maçã', 'açaí', 'tucumã']
>>> repr('maçã')
"'maçã'"