Search code examples
pythonjsonpandasdataframejson-normalize

Parse a JSON column in a df and extract specific key value


I have a pandas DataFrame containing one column with a nested JSON dict. I want to normalize the JSON column ('media') and extract the value for the key 'url' when it is present. The 'media' json payload has three types of possible media objects all included in the example data set. I need to extract from the 'MessageMediaWebPage' object, only.

The typical error (although there is some variation) after using

# Using JSON normalize function

pd.json_normalize(df['media'], max_level=1)

AttributeError: 'str' object has no attribute 'values'

The full error is listed below

Data and code example in a JSON format. The example data set only consists of three records but is too large to post directly. The link is to my git:

# The json is nested
# The full data set consists of several columns and 40K + records. This is just a small slice.

df = pd.read_json('https://raw.githubusercontent.com/whitedl/telegram/main/df_3.json', dtype = {'telegram_id':object}) 

df.info()

pd.json_normalize(df['media'], max_level=1)

As background I have tried the following solutions:


Solution

  • The problem is that value in media column is string type. You can apply ast.literal_eval to media column to convert it value to python dict.

    import ast 
    
    
    pd.json_normalize(df['media'].apply(ast.literal_eval), max_level=1)