Search code examples
pythonunixencodingsubprocess

What's the encoding used in the output of Linux commands like find, accessed from Python?


Python provides a subprocess import that allows fine-grained control of processes, but when I'm creating a process in Unix such as find, what's the encoding of the output of these standard Gnu commands?

import subprocess
myProcess = subprocess.Popen(shlex.split('find ./dir -mindepth 1 -maxdepth 1 -type f -mtime +14'), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
outputs = myProcess.communicate()
myStdout = outputs[0].read()
myStderr = outputs[1].read()
print(myStdout.decode('utf-8/ascii')) # ???

I'm guessing you can get away with either decoding but officially, how am I supposed to interpret the output of all the standard Unix commands that "output text to the console"?


Solution

  • Encoding is not fixed and depends on user's locale. It's usually UTF-8 which is default settings for most of modern distros, but again - user can change that.

    You can check current encoding in your local terminal

    $ locale charmap
    

    To get that from Python, you can use locale.getpreferredencoding():

    import locale
    
    ...
    
    encoding = locale.getpreferredencoding()
    print(myStdout.decode(encoding))