Search code examples
pythonunicodeyoutube-dl

python youtube-dl output file not in unicode?


Is there anyway to get the youtube-dl.extract_info() function to use unicode when creating the output file?

I have encountered the problem that if you download something with unicode characters like | in the title then the output file name will not have the same character. It will be replaced with _ instead.

Take this song title for example. If I download it with youtube-dl then I get this file name 【Nightcore】→ Pretty Girl _ Lyrics-dMAOnScOyGE. Same thing happens with different kind of characters.

Is there any way to stop this? Because it's a annoying if you want do do anything with that file afterwards.

To get the new file name I would need to do something like os.listdir(dir) to get the file. So it's not impossible to get the new file name, but I am just interested if there is a easier way.


Solution

  • The encoding of | to _ is hardcoded in sanitize_filename in youtube_dl/utils.py. You can turn it off programatically by substituting youtube_dl.utils.sanitize_filename with your own implementation.

    However, doing so is not recommended, and not supported out of the box. This is because | is an invalid character on Windows and can be used to execute arbitrary commands if expanded in a buggy script.

    Insecure filenames were supported at one time, but I removed them from youtube-dl because too many people were shooting themselves in the foot, and often reported problems that clearly would have let any attacker execute arbitrary code on their machines.