Search code examples
pythonjsondjangounicodeiconv

encode unicode characters to unicode escape sequences


I've a CSV file containing sites along with addresses. I need to work on this file to produce a json file that I will use in Django to load initial data to my database. To do that, I need to convert all special characters from the CSV file to unicode escaped characters.

Here is an example:

Örnsköldsvik;SE;Ornskoldsvik;Ångermanlandsgatan 28 A

It should be converted to:

\u00D6rnsk\u00F6ldsvik;SE;Ornskoldsvik;\u00C5ngermanlandsgatan 28 A

The following site is doing exactly the conversion I'm expecting: http://itpro.cz/juniconv/ but I'de like to find a way to do it from command line (bash) or in python. I've already tried using iconv, uconv and some python scripts without real success.

What kind of script is running behind the juniconv website?

Thank you in avance for any suggestion.


Solution

  • If you want to get Unicode escapes similar to Java in Python; you could use JSON format:

    >>> import json
    >>> import sys
    >>> s = u'Örnsköldsvik;SE;Ornskoldsvik;Ångermanlandsgatan 28 A'
    >>> json.dump(s, sys.stdout)
    "\u00d6rnsk\u00f6ldsvik;SE;Ornskoldsvik;\u00c5ngermanlandsgatan 28 A"
    

    There is also, unicode-escape codec but you shouldn't use it: it produces Python-specific escaping (how Python Unicode string literals look like):

    >>> print s.encode('unicode-escape')
    \xd6rnsk\xf6ldsvik;SE;Ornskoldsvik;\xc5ngermanlandsgatan 28 A