Search code examples
pythonpython-3.xencodingreadfile

Read file with Python without knowing encoding


I want to read all files from a folder (with os.walk) and convert them to one encoding (UTF-8). The problem is those files don't have same encoding. They could be UTF-8, UTF-8 with BOM, UTF-16.

Is there any way to do read those files without knowing their encoding?


Solution

  • You can read those files in binary mode. Also, the chardet library can help you detect character encoding. Using chardet, you can detect the encoding of your files and decode the data you get. Though this module has limitations.

    As an example:

    from chardet import detect
    
    with open('your_file.txt', 'rb') as ef:
        detect(ef.read())