Search code examples
pythonasciidecodeasciiencoding

Handling ascii char in python string


i have file having name "SSE-Künden, SSE-Händler.pdf" which having those two unicode char ( ü,ä) when i am printing this file name on python interpreter the unicode values are getting converted into respective ascii value i guess 'SSE-K\x81nden, SSE-H\x84ndler.pdf' but i want to

test dir contains the pdf file of name 'SSE-Künden, SSE-Händler.pdf'

i tried this: path = 'C:\test' for a,b,c in os.walk(path): print c

['SSE-K\x81nden, SSE-H\x84ndler.pdf']

how do i convert this ascii chars to its respective unicode vals and i want to show the original name("SSE-Künden, SSE-Händler.pdf") on interpreter and also writeing into some file as it is.how do i achive this. I am using Python 2.6 and windows OS.

Thanks.


Solution

  • Assuming your terminal supports displaying the characters, iterate over the list of files and print them individually (or use Python 3, which displays Unicode in lists):

    Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import os
    >>> for p,d,f in os.walk(u'.'):
    ...  for n in f:
    ...   print n
    ...
    SSE-Künden, SSE-Händler.pdf
    

    Also note I used a Unicode string (u'.') for the path. This instructs os.walk to return Unicode strings as opposed to byte strings. When dealing with non-ASCII filenames this is a good idea.

    In Python 3 strings are Unicode by default and non-ASCII characters are displayed to the user instead of displayed as escape codes:

    Python 3.2.1 (default, Jul 10 2011, 21:51:15) [MSC v.1500 32 bit (Intel)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import os
    >>> for p,d,f in os.walk('.'):
    ...  print(f)
    ...
    ['SSE-Künden, SSE-Händler.pdf']