Search code examples
pythonstringdouble-quotes

python string including double quote character


I have input strings that are comprised of characters, including double and single quotes " and '

B@SS$*JU(PQ
AD&^%$^@!$
%()%@@DDSFD"*")(#
ABD*E@(%J^&@

however, when I open the above input from a text file and just print it, the double quotes " in the third line get printed as \xe2\x80\x9d

I am aiming to do a simple character count:

B 2
@ 3
S 2
$ 3
etc.

so I want to be able to output

" 3

in the above list. Should I replace the double quotes with something so I can count them and print out the count?

Thanks a lot.


Solution

  • \xe2\x80\x9d

    Is a unicode value for "special" double quotes. You could decode from UTF-8 into Unicode to convert this into a "single" Unicode character.

    >>> print "\xe2\x80\x9d".decode("utf-8")
    ”
    >>> len("\xe2\x80\x9d".decode("utf-8"))
    1
    

    If you are using Python 3:

    >>> print(b"\xe2\x80\x9d".decode('utf8'))
    ”
    >>> len(b"\xe2\x80\x9d".decode("utf-8"))
    1
    

    So for your file that you are counting (in Python 2):

    from collections import defaultdict
    with open("filename", 'r') as f:
        for text in f:
            decoded = text.decode("utf-8")
            count = defaultdict(int)
            for i in decoded:
                count[i] += 1