Search code examples
pythonasciispecial-charactersnon-ascii-charactershtml-escape-characters

Special HTML characters in Python to ASCII


I want to convert special characters which I see during web-page reading to the ASCII format. I've tried a lot, but I can't figure it out. I will give some examples below which are stored in a string in Python.I don't know what the current encoding of the web-page is, but I want to convert it to ASCII format.

Apaydın Ünal > want this to Apaydin Unal
Íñigo Martínez > want this to Inigo Martinez
Üstünel > want this to Ustunel

Who can help me?

EDIT: Thanks, I forgot. I'm using Python 2.7


Solution

  • Give https://pypi.python.org/pypi/Unidecode a try:

    >>> from unidecode import unidecode
    >>> unidecode(u'ko\u017eu\u0161\u010dek')
    'kozuscek'
    

    And to detect the encoding, see the question Determine the encoding of text in Python