Search code examples
pythoncharacter-encoding

How to deal with Korean character and special symbol in file interpret


Here's my video file name "[M⧸V] 김예림(림킴) - Confess To You :: 킹더랜드(King the Land) OST Part.2 [vzH3-8_Y7oo].mp4"

import subprocess
url='https://www.youtube.com/watch?v=vzH3-8_Y7oo'
filename = subprocess.getoutput(f'yt-dlp -f 399 {url} --print filename')
print(filename)

Here's the output [MV] () - Confess To You :: (King the Land) OST Part.2 [vzH3-8_Y7oo].mp4

as can see '/' and Korean character is missing how to solve it?

I try

import subprocess
import unicodedata
url='https://www.youtube.com/watch?v=vzH3-8_Y7oo'
filename = subprocess.getoutput(f'yt-dlp -f 399 {url} --print filename')
filename_normalized = unicodedata.normalize('NFKD', filename).encode('ascii', 'ignore').decode('utf-8')
print(filename)

but still cannot print the right filename, I found that I could properly print Mandarin character within the file while Korean character cannot

Solved I add "--encoding utf-8" at the end of link

cmd = "yt-dlp -f 399 https://www.youtube.com/watch?v=vzH3-8_Y7oo --print filename --encoding utf-8"
videos = RUN(cmd.split(), capture_output=True, encoding='utf-8').stdout
print(videos)

Solution

  • The run() function of the subprocess module is the preferred mechanism. You can provide an encoding specification.

    Thus:

    from subprocess import run as RUN
    
    cmd = [
        "yt-dlp",
        "-f",
        "399",
        "https://www.youtube.com/watch?v=vzH3-8_Y7oo",
        "--print",
        "filename"
    ]
    
    print(RUN(cmd, capture_output=True, encoding="utf-8").stdout)
    

    ...gives this output:

    [M⧸V] 김예림(림킴) - Confess To You :: 킹더랜드(King the Land) OST Part.2 [vzH3-8_Y7oo].mp4