Search code examples
pythonpython-3.xyamlpyyaml

Pyyaml not reading (non-ascii) character correctly ("§")


I have the following file (sample.yml):

Key: §

When I try to read it with the following function:

import yaml

with open("sample.yml", "r") as f:
    file = yaml.safe_load(f)

the file is not read correctly: print(file) returns

{'Key': '§'}

Even though I wanted to avoid escape sequences, are there such escape sequences that can be used in this case? Is there a way without them?

I also tried setting the § in singlequotes ('§') or dobule quotes ("§"), however, this did not solve the problem. Also, using yaml.load instead of yaml.safe_load

I am using pyyaml v5.4.1

How can I read the yaml file correctly?


Solution

  • You should pass the encoding explicitly to be certain:

    with open(filepath, "r", encoding="utf-8") as f:
    

    From open's documentation

    The default encoding is platform dependent (whatever locale.getpreferredencoding() returns).

    This is still true for 3.10 but could change in the future and become utf-8.

    The output you posted though looks like what you'd get if you tried to read or display a UTF8 string as ASCII (Latin1 specifically). In this case, the UTF8 byte representation for § is 0xC2 0xA7. 0xA7is the byte value in Latin1 too, so the character ends up appearing as §