Search code examples
pythonjsonsimplejson

SimpleJson handling of same named entities


I'm using the Alchemy API in app engine so I'm using the simplejson library to parse responses. The problem is that the responses have entries that have the sme name

 {
    "status": "OK",
    "usage": "By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html",
    "url": "",
    "language": "english",
    "entities": [
        {
            "type": "Person",
            "relevance": "0.33",
            "count": "1",
            "text": "Michael Jordan",
            "disambiguated": {
                "name": "Michael Jordan",
                "subType": "Athlete",
                "subType": "AwardWinner",
                "subType": "BasketballPlayer",
                "subType": "HallOfFameInductee",
                "subType": "OlympicAthlete",
                "subType": "SportsLeagueAwardWinner",
                "subType": "FilmActor",
                "subType": "TVActor",
                "dbpedia": "http://dbpedia.org/resource/Michael_Jordan",
                "freebase": "http://rdf.freebase.com/ns/guid.9202a8c04000641f8000000000029161",
                "umbel": "http://umbel.org/umbel/ne/wikipedia/Michael_Jordan",
                "opencyc": "http://sw.opencyc.org/concept/Mx4rvViVq5wpEbGdrcN5Y29ycA",
                "yago": "http://mpii.de/yago/resource/Michael_Jordan"
            }
        }
    ]
}

So the problem is that the "subType" is repeated so the dict that a loads returns is just "TVActor" rather than a list. Is there anyway to go around this?


Solution

  • The rfc 4627 that defines application/json says:

    An object is an unordered collection of zero or more name/value pairs
    

    And:

    The names within an object SHOULD be unique.
    

    It means that AlchemyAPI should not return multiple "subType" names inside the same object and claim that it is a JSON.

    You could try to request the same in XML format (outputMode=xml) to avoid ambiguity in the results or to convert duplicate keys values into lists:

    import simplejson as json
    from collections import defaultdict
    
    def multidict(ordered_pairs):
        """Convert duplicate keys values to lists."""
        # read all values into lists
        d = defaultdict(list)
        for k, v in ordered_pairs:
            d[k].append(v)
    
        # unpack lists that have only 1 item
        for k, v in d.items():
            if len(v) == 1:
                d[k] = v[0]
        return dict(d)
    
    print json.JSONDecoder(object_pairs_hook=multidict).decode(text)
    

    Example

    text = """{
      "type": "Person",
      "subType": "Athlete",
      "subType": "AwardWinner"
    }"""
    

    Output

    {u'subType': [u'Athlete', u'AwardWinner'], u'type': u'Person'}