I'm using the Alchemy API in app engine so I'm using the simplejson library to parse responses. The problem is that the responses have entries that have the sme name
{
"status": "OK",
"usage": "By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html",
"url": "",
"language": "english",
"entities": [
{
"type": "Person",
"relevance": "0.33",
"count": "1",
"text": "Michael Jordan",
"disambiguated": {
"name": "Michael Jordan",
"subType": "Athlete",
"subType": "AwardWinner",
"subType": "BasketballPlayer",
"subType": "HallOfFameInductee",
"subType": "OlympicAthlete",
"subType": "SportsLeagueAwardWinner",
"subType": "FilmActor",
"subType": "TVActor",
"dbpedia": "http://dbpedia.org/resource/Michael_Jordan",
"freebase": "http://rdf.freebase.com/ns/guid.9202a8c04000641f8000000000029161",
"umbel": "http://umbel.org/umbel/ne/wikipedia/Michael_Jordan",
"opencyc": "http://sw.opencyc.org/concept/Mx4rvViVq5wpEbGdrcN5Y29ycA",
"yago": "http://mpii.de/yago/resource/Michael_Jordan"
}
}
]
}
So the problem is that the "subType" is repeated so the dict that a loads returns is just "TVActor" rather than a list. Is there anyway to go around this?
The rfc 4627 that defines application/json
says:
An object is an unordered collection of zero or more name/value pairs
And:
The names within an object SHOULD be unique.
It means that AlchemyAPI should not return multiple "subType"
names inside the same object and claim that it is a JSON.
You could try to request the same in XML format (outputMode=xml
) to avoid ambiguity in the results or to convert duplicate keys values into lists:
import simplejson as json
from collections import defaultdict
def multidict(ordered_pairs):
"""Convert duplicate keys values to lists."""
# read all values into lists
d = defaultdict(list)
for k, v in ordered_pairs:
d[k].append(v)
# unpack lists that have only 1 item
for k, v in d.items():
if len(v) == 1:
d[k] = v[0]
return dict(d)
print json.JSONDecoder(object_pairs_hook=multidict).decode(text)
text = """{
"type": "Person",
"subType": "Athlete",
"subType": "AwardWinner"
}"""
{u'subType': [u'Athlete', u'AwardWinner'], u'type': u'Person'}