Search code examples
pythonjsonpandasnumpyseries

Python compute average value of key in series of JSON


I have a pandas.core.series.Series where each element is a JSON as shown

0     {"count": 157065, "grp": {"a1": 12, "a2": 32}}
1     {"count": 2342, "grp": {"a1": 4, "a2": 34}}
2     {"count": 543, "grp": {"a1": 1, "a2": 11}}
3     {"count": 156, "grp": {"a1": 56, "a2": 75}}

How to compute the average value of count in all the JSONs and also compute the average value of a1 and a2?


Solution

  • I'm not entirely sure whether this is what you were asking for.

    This is for calculating the average of "count"

    doc1 = {"count": 157065, "grp": {"a1": 12, "a2": 32}}
    doc2 = {"count": 2342, "grp": {"a1": 4, "a2": 34}}
    doc3 = {"count": 543, "grp": {"a1": 1, "a2": 11}}
    doc4 = {"count": 156, "grp": {"a1": 56, "a2": 75}}
    lojs = [doc1, doc2, doc3, doc4] # list of all the jsons
    
    countaverage = 0
    # For every json, it gets the count and adds it to the variable I defined
    for j in lojs:
        countaverage += j["count"]
    # Divides it by the length of the amount of documents
    countaverage = countaverage/len(lojs)
    

    And if you wanted to get the average of a1 with or instead of the one above, you could use this code:

    a1average = 0
    for j in lojs:
        a1average += j["grp"]["a1"] # getting "a1" inside of "grp"
    a1average = a1average/len(lojs)
    

    and you could just swap a1 out for a2 if wanted to get a2

    EXTENSION For documents that might have different amount of "a"s:

    doc1 = {"count": 157065, "grp": {"a1": 12, "a2": 32}}
    doc2 = {"count": 2342, "grp": {"a1": 4, "a2": 34}}
    doc3 = {"count": 543, "grp": {"a1": 1, "a2": 11, "a3": 46, "a4": 23}}
    doc4 = {"count": 156, "grp": {"a1": 56, "a2": 75, "a3": 23}}
    lojs = [doc1, doc2, doc3, doc4]
    
    grps = [] # defining a list that will contain all of the "a"s
    for doc in lojs: # getting each document in the list of documents
        for a in doc["grp"].keys(): # getting all the keys in the grp of that document
            if a not in grps: # checking whether the "a" already exists in the list of "a"s
                grps.append(a) # adding the new "a" to the list
    
    averages = {} # using a dict instead of a list because it will be containing multiple values
    for grp in grps: # getting each "a"
        averages[grp] = [0, 0] # setting the value of that "a" to zero
    
    for grp in grps: # getting each "a"
        for doc in lojs: # getting each document
            if grp in doc["grp"].keys(): # getting every "a" in the grp of the document
                averages[grp][0] += doc["grp"][grp] # adding the value of that a to the corresponding value/key (idk dude) in the dictionary
                averages[grp][1] += 1 # increasing the amount the "a" has been mentioned by 1
    
    for el in averages: # getting each average
        averages[el][0] = averages[el][0]/averages[el][1] # dividing b
    

    And you can get the value of each average using

    averages["a3"][0]

    Of course, you can change "a3" to whichever "a" you want. Btw, if it isn't clear, you are getting the first element because the value of that key is a list that contains both the averaged (idk if that's a word) value and the amount of times the "a" has occurred inside your documents.

    This probably isn't the most efficient way, but I mean, it works!