I'm new to Python and I'm trying to encode a UTF8 string. Using PHP's json_encode()
, …
(U+2026) becomes \u2026
. However, using Python's json.dumps()
, it becomes \u00e2\u20ac\u00a6
. How do I convert this to \u2026
in Python?
Here's the entire program:
import nltk
import json
file=open('pos_tag.txt','r')
tags=nltk.pos_tag(nltk.word_tokenize(file.read()))
print(json.dumps(tags,separators=(',',':')))
The problem lied in file.open()
. I was able to fix it using the codecs module:
import nltk
import json
import codecs
file=codecs.open('pos_tag.txt','r','utf-8')
tags=nltk.pos_tag(nltk.word_tokenize(file.read()))
print(json.dumps(tags,separators=(',',':')))