Search code examples
pythonunicode

<200b></200b> weird sign in string - how to remove it


I have these weird <200b></200b> signs in my strings. What is it and how can I remove it? It seems to just be whitespace

You appreciate traditional values ​​and expect respect

​​ is how it displays when I look at it in console.


Solution

  • If you want to remove it, you can normalize it with unicodedata...

    >>> import unicodedata
    >>> unicodedata.normalize('NFC', u'Goodbye​​Garbage').encode('ascii', 
        'ignore')
    'GoodbyeGarbage'
    >>> 
    

    Note that this simply returns a ASCII string, and you don't have unicode after using this technique.

    Another option that only works with the example you provided...

    >>> u'Goodbye​​Garbage'.encode('ascii', 'ignore')
    'GoodbyeGarbage'
    >>> 
    

    Adding unicodedata gives you more flexibility to deal with strange cases and decompose them to real ASCII, but a raw .encode('ascii', 'ignore') will strip out all unicode characters without trying to normalize them first.