Search code examples
pythonunicodeencodingutf-8

How do I check if a string is unicode or ascii?


What do I have to do in Python to figure out which encoding a string has?


Solution

  • In Python 3, all strings are sequences of Unicode characters. There is a bytes type that holds raw bytes.

    In Python 2, a string may be of type str or of type unicode. You can tell which using code something like this:

    def whatisthis(s):
        if isinstance(s, str):
            print "ordinary string"
        elif isinstance(s, unicode):
            print "unicode string"
        else:
            print "not a string"
    

    This does not distinguish "Unicode or ASCII"; it only distinguishes Python types. A Unicode string may consist of purely characters in the ASCII range, and a bytestring may contain ASCII, encoded Unicode, or even non-textual data.