Updating Japanese into nfo file turned into garbled character

Below is the original nfo file in format that Emby using

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<movie>
  <plot />
  <outline />
  <lockdata>false</lockdata>
  <dateadded>2023-02-22 21:52:29</dateadded>
  <title>old title</title>
  <sorttitle>old title</sorttitle>
  <runtime>119</runtime>
  <fileinfo>
    <streamdetails>
      <video>
        <codec>h264</codec>
        <micodec>h264</micodec>
        <bitrate>5744052</bitrate>
        <width>1920</width>
        <height>1080</height>
        <aspect>16:9</aspect>
        <aspectratio>16:9</aspectratio>
        <framerate>29.96973</framerate>
        <language>und</language>
        <scantype>progressive</scantype>
        <default>True</default>
        <forced>False</forced>
        <duration>119</duration>
        <durationinseconds>7168</durationinseconds>
      </video>
      <audio>
        <codec>aac</codec>
        <micodec>aac</micodec>
        <bitrate>256000</bitrate>
        <language>und</language>
        <scantype>progressive</scantype>
        <channels>2</channels>
        <samplingrate>48000</samplingrate>
        <default>True</default>
        <forced>False</forced>
      </audio>
    </streamdetails>
  </fileinfo>
</movie>

And I am trying to update the title with below python script

import xml.etree.ElementTree as ET

title = "千と千尋の神隠し"

# Load the NFO file
filename = "movie.nfo"
tree = ET.parse(filename)
root = tree.getroot()

# Find the <title> tag and replace its text value with the new title
title_elem = root.find("title")
title_elem.text = title

# Write the updated XML structure to the NFO file
tree.write(filename, encoding="utf-8", xml_declaration=True)

But after I run the script, the title turned into garbled character

<title>σìâπü¿σìâσ░ïπü«τÑ₧ΘÜáπüù</title>

I know it is must be an encoding issue, but I do not know how to solve it

The nfo file should be updated to

<title>千と千尋の神隠し</title>

Solution

You face a mojibake case:

print("千と千尋の神隠し".encode('utf-8').decode('cp437'))

σìâπü¿σìâσ░ïπü«τÑ₧ΘÜáπüù

The problem is the .NFO file extension:

The NFO file extension is used for a Warez Information File developed by THG. NFO file is basically pirated information pertaining to a software or program that is released and distributed by any organized group without the knowledge or permission of the creator or owner of such programs…

Wikipedia .nfo says - NFO files often contain elaborate ANSI art (It is similar to ASCII art, but constructed from a larger set of 256 letters, numbers, and symbols — all codes found in IBM code page 437, often referred to as extended ASCII).

Oddly enough, *.nfo files are always recognized as OEM-US encoding even in Notepad++ (see this issue at github)

Result: your file is UTF8.

Proof #1:

import xml.etree.ElementTree as ET

# Load the NFO file
filename = "movie.nfo"
tree = ET.parse(filename)
root = tree.getroot()

# Find the <title> tag
title_elem = root.find("title")
print( title_elem.text)

千と千尋の神隠し

Proof #2:

filename = "movie.nfo"
with open(filename, mode='r', encoding='utf-8') as fnfo:
    lines = fnfo.readlines()

print([line for line in lines if '<title>' in line])

['  <title>千と千尋の神隠し</title>\n']