I'm a student to learn python scrapy(crawler).
I want to convert unicode string to str in python. but this unicode string is not common string. this unicode is unicode format. please see below code.
# python 2.7
...
print(type(name[0]))
print(name[0])
print(type(keyword_name_temp))
print(keyword_name_temp)
...
I can see console like below, when run upper script.
$ <type 'unicode'>
$ 서용교 ## this words is korean characters
$ <type 'unicode'>
$ u'\\uc9c0\\ubc29\\uc790\\uce58\\ub2e8\\uccb4'
I want see "keyword_name_temp" as korean. but I don't know how to do...
I got the name list and keyword_name_temp from html code with http request.
name list fundamentally was String format.
keyword_name_temp fundamentally was unicode format.
please anybody help me !
u'\\uc9c0\\ubc29\\uc790\\uce58\\ub2e8\\uccb4'
contains real backslashes (backslash being an escape character in Python string literals, python interpreter prints backslash in strings as \\
) followed by u
and hex sequences, not literal Unicode characters U+C9C0 etc.
which are commonly written using \u
escape sequence
(Would that string happen to come from some JSON object perhaps?)
You can construct a JSON string out of it, and use json.loads()
to transform to a unicode string:
Example in Python 2.7:
>>> s1 = u'서용교'
>>> type(s1)
<type 'unicode'>
>>> s1
u'\uc11c\uc6a9\uad50'
>>> print(s1)
서용교
>>>
>>>
>>> s2 = u'\\uc9c0\\ubc29\\uc790\\uce58\\ub2e8\\uccb4'
>>> type(s2)
<type 'unicode'>
>>>
>>> # put that unicode string between double-quotes
>>> # so that json module can interpret it
>>> ts2 = u'"%s"' % s2
>>> ts2
u'"\\uc9c0\\ubc29\\uc790\\uce58\\ub2e8\\uccb4"'
>>>
>>> import json
>>> json.loads(ts2)
u'\uc9c0\ubc29\uc790\uce58\ub2e8\uccb4'
>>> print(json.loads(ts2))
지방자치단체
>>>
Another option is to make it a string literal
>>> import ast
>>>
>>> # construct a string literal, with the 'u' prefix
>>> s2_literal = u'u"%s"' % s2
>>> s2_literal
u'u"\\uc9c0\\ubc29\\uc790\\uce58\\ub2e8\\uccb4"'
>>> print(ast.literal_eval(s2_literal))
지방자치단체
>>>
>>> # also works with single-quotes string literals
>>> s2_literal2 = u"u'%s'" % s2
>>> s2_literal2
u"u'\\uc9c0\\ubc29\\uc790\\uce58\\ub2e8\\uccb4'"
>>>
>>> print(ast.literal_eval(s2_literal2))
지방자치단체
>>>