Search code examples
pythondataframedictionarylines-of-code

traverse a list of dictionaries and store them into a separate column


https://github.com/ajayvd/DOUBT_SO/blob/main/doubt.csv

The above link contains the data frame and it contains 2 columns response_value & response_errors & The response_errors contains the value in list of dictionaries .

problem statement

so here i want to iterate only the provider key and store them next to it in a column in form of list

for example :

for row 1 it will be [ shipt ]

for row 2 [ shipt ]

for row 3 [ doordash , pickupnow , roadie ]

enter image description here

you would be required to do preprocessing , so i am attaching the snippet below .

def parse_json(webhook_data):
    try:
        data = json.loads(webhook_data)
    except ValueError as ex:
        data={'estimate':'error'}
    return data


data_metering['RESPONSE_ERRORS']=data_metering['RESPONSE_ERRORS'].apply(parse_json)

custom function which i code , please modify the code and tell me the optimised way to do it.

def traverse_dsp(data_frame,column):
    provider = []
    

    #Iterate over each row in Dataframe
    for index, row in data_frame.iterrows():
       # Iterate over each json object in each row in DataFrame
        for i in range(0,len(row[column])):

            for k,v in row[column][i].items():

                if k=="provider":

                    provider_val=v
                    
                    x.loc[index,"provider_list"]=provider
                    
           
x=data_metering
traverse_dsp(x,"RESPONSE_ERRORS")


Solution

  • IIUC, you can try this :

    from ast import literal_eval
    ​
    df_ = pd.read_csv("https://raw.githubusercontent.com/ajayvd/DOUBT_SO/main/doubt.csv")
    ​
    ser = df_["RESPONSE_ERRORS"].apply(literal_eval).explode()
    ​
    df = (df_.join(pd.DataFrame(ser.tolist(), index=ser.index)
                           .groupby(level=0)["provider"].agg(list)))
    

    Output :

    print(df["provider"])
    
    0                          [Shipt]
    1                          [Shipt]
    2    [DoorDash, PickupNow, Roadie]
    3                       [DoorDash]
    4                       [DoorDash]
    Name: provider, dtype: object
    
    print(df)
    
                         RESPONSE_VALUE                                                                                                                                                                                                                                                                                                                                                                                                                                        RESPONSE_ERRORS provider_list                       provider
    0                         ['Shipt']                                                                                                                                                                                                                                                                                                                               [{'code': 0, 'message': "Shipt doesn't provide estimates", 'provider': 'Shipt', 'type': 'DSP_DOES_NOT_OFFER_ESTIMATES'}]         shipt                        [Shipt]
    1                         ['Shipt']                                                                                                                                                                                                                                                                                                                               [{'code': 0, 'message': "Shipt doesn't provide estimates", 'provider': 'Shipt', 'type': 'DSP_DOES_NOT_OFFER_ESTIMATES'}]           NaN                        [Shipt]
    2                         ['FedEx']  [{'code': 400, 'message': 'Distance from pickup address to dropoff address exceeds 15 miles (24 kilometers): dropoff_address', 'provider': 'DoorDash', 'type': 'MAX_DISTANCE_EXCEEDED'}, {'code': 0, 'message': 'Not serviceable, Reason: Mileage', 'provider': 'PickupNow', 'type': 'UNAVAILABLE'}, {'code': 400, 'message': 'Route Sorry, distance in passenger vehicles is limited to 30 miles.', 'provider': 'Roadie', 'type': 'UNMAPPED_ERROR'}]           NaN  [DoorDash, PickupNow, Roadie]
    3  ['PickupNow', 'FedEx', 'Roadie']                                                                                                                                                                                                                                                               [{'code': 400, 'message': 'Distance from pickup address to dropoff address exceeds 15 miles (24 kilometers): dropoff_address', 'provider': 'DoorDash', 'type': 'MAX_DISTANCE_EXCEEDED'}]           NaN                     [DoorDash]
    4                        ['Roadie']                                                                                                                                                                                                                                                               [{'code': 400, 'message': 'Distance from pickup address to dropoff address exceeds 15 miles (24 kilometers): dropoff_address', 'provider': 'DoorDash', 'type': 'MAX_DISTANCE_EXCEEDED'}]           NaN                     [DoorDash]