Is there anyway to get the youtube-dl.extract_info()
function to use unicode when creating the output file?
I have encountered the problem that if you download something with unicode characters like |
in the title then the output file name will not have the same character. It will be replaced with _
instead.
Take this song title for example.
If I download it with youtube-dl then I get this file name 【Nightcore】→ Pretty Girl _ Lyrics-dMAOnScOyGE
. Same thing happens with different kind of characters.
Is there any way to stop this? Because it's a annoying if you want do do anything with that file afterwards.
To get the new file name I would need to do something like os.listdir(dir) to get the file. So it's not impossible to get the new file name, but I am just interested if there is a easier way.
The encoding of |
to _
is hardcoded in sanitize_filename
in youtube_dl/utils.py
. You can turn it off programatically by substituting youtube_dl.utils.sanitize_filename
with your own implementation.
However, doing so is not recommended, and not supported out of the box. This is because |
is an invalid character on Windows and can be used to execute arbitrary commands if expanded in a buggy script.
Insecure filenames were supported at one time, but I removed them from youtube-dl because too many people were shooting themselves in the foot, and often reported problems that clearly would have let any attacker execute arbitrary code on their machines.