Search code examples
pythonunicodeutf-8git-bashpython-unicode

Using UTF-8 in Python 3 string literals


I have a script I'm writing where I need to print the character sequence "Qä" to the terminal. My terminal is using UTF-8 encoding. My file has # -*- coding: utf-8 -*- at the top of it, which I think is not actually necessary for Python 3, but I put it there in case it made any difference. In the code, I have something like

print("...Qä...")

This does not produce Qä. Instead it produces Q▒.

I then tried

qa = "Qä".encode('utf-8')
print(f"...{qa}...")

This also does not produce Qä. It produces 'Q\xc3\xa4'.

I also tried

qa = u"Qä"
print(f"...{qa}...")

This also produces Q▒.

However, I know that Python 3 can open files that contain UTF-8 and use the contents properly, so I created a file called qa.txt, pasted Qä into it, and then used

with open("qa.txt") as qa_file:
    qa = qa_file.read().strip()
print(f"...{qa}...")

This works. However, it's beyond dumb that I have to create this file in order to print this string. How can I put this text into my code as a string literal?

This question is NOT a duplicate of a question asking about Python 2.7, I am not using Python 2.7.


Solution

  • You're using Git Bash, on Windows. On Windows, except if stdio is connected to a standard Windows console (which I don't think Git Bash counts as), Python defaults the standard streams to a locale encoding of 'cp1252'. Your terminal is set to expect UTF-8, not CP1252. You can reconfigure the standard output stream to UTF-8 with

    sys.stdout.reconfigure(encoding='utf-8')
    

    and similarly for stdin and stderr, or you can set the PYTHONIOENCODING environment variable to utf-8 before running Python to change the default stdin/stdout/stderr encodings.