Search code examples
pythonunicodespecial-characters

Removing special characters (¡) from a string


I am trying to write into a file from a collection. The collection has special characters like ¡ which create a problem. For example the content in the collection has details like:

{..., Name: ¡Hi!, ...}

Now I am trying to write the same into a file but I get the error

UnicodeEncodeError: 'ascii' codec can't encode character u'\xa1' in position 0: ordinal not in range(128)

I have tried the using the solutions provided here but in vain. It will be great if someone could help me with this :)

So the example goes like this:

I have a collection which has the following details

{ "_id":ObjectId("5428ead854fed46f5ec4a0c9"), 
   "author":null,
   "class":"culture",
   "created":1411967707.356593,
   "description":null,
   "id":"eba9b4e2-900f-4707-b57d-aa659cbd0ac9",
   "name":"¡Hola!",
   "reviews":[

   ],
   "screenshot_urls":[

   ]
}

Now I try to access the name entry here from the collection and I do that by iterating it over the collection i.e.

f = open("sample.txt","w");

for val in exampleCollection:
   f.write("%s"%str(exampleCollection[val]).encode("utf-8"))

f.close();

Solution

  • The easiest way to remove characters you don't want is to specify the characters you do.

    >>> import string
    >>> validchars = string.ascii_letters + string.digits + ' '
    >>> s = '¡Hi there!'
    >>> clean = ''.join(c for c in s if c in validchars)
    >>> clean
    'Hi there'
    

    If some forms of punctuation are okay, add them to validchars.