Search code examples
pythonpython-2.7non-ascii-characterspython-unicodeunicode-escapes

Removing unicode \u2026 like characters in a string in python2.7


I have a string in python2.7 like this,

 This is some \u03c0 text that has to be cleaned\u2026! it\u0027s annoying!

How do i convert it to this,

This is some text that has to be cleaned! its annoying!

Solution

  • Python 2.x

    >>> s
    'This is some \\u03c0 text that has to be cleaned\\u2026! it\\u0027s annoying!'
    >>> print(s.decode('unicode_escape').encode('ascii','ignore'))
    This is some  text that has to be cleaned! it's annoying!
    

    Python 3.x

    >>> s = 'This is some \u03c0 text that has to be cleaned\u2026! it\u0027s annoying!'
    >>> s.encode('ascii', 'ignore')
    b"This is some  text that has to be cleaned! it's annoying!"