Search code examples
pythonunicodehabanero

Getting wrong Unicode from Habenero API in Python


I am calling the Habanero API which is a front end to CrossRef. With this code:

import json
from habanero import cn

x = cn.content_negotiation(ids="10.1051/0004-6361/201628812",
    format = "text", style = 'elsevier-harvard')

print(u'{0}'.format(x))
print(json.dumps(x, indent=4, sort_keys=True))

I get the following output (emphasis mine):

Hawkins, K., Masseron, T., Jofré, P., Gilmore, G., Elsworth, Y., Hekker, S., 2016. An accurate and self-consistent chemical abundance catalogue for the APOGEE/Keplersample. Astronomy & Astrophysics 594, A43.

"Hawkins, K., Masseron, T., Jofr\u00c3\u00a9, P., Gilmore, G., Elsworth, Y., Hekker, S., 2016. An accurate and self-consistent chemical abundance catalogue for the APOGEE/Keplersample. Astronomy & Astrophysics 594, A43.\n"

The third author's name should be Jofré, so I imagine the final character was supposed to be \uc3a9. It seems from the JSON dump that it is sending \u00c3\u00a9 instead. Am I doing something wrong in either requesting or decoding?


Solution

  • Fixed in git repo. h/t @sckott.