Search code examples
pythonjsonpandasdictionaryeval

How can I handle missing values in the dictionary when I use the function eval(String dictionary) -> dictionary PYTHON?


I need to convert the ‘content’ column from a string dictionary to a dictionary in python. After that I will use the following line of code:

df[‘content’].apply(pd.Series).

To have the dictionary values as a column name and the dictionary value in a cell.

I can’t do this now because there are missing values in the dictionary string.

How can I handle missing values in the dictionary when I use the function eval(String dictionary) -> dictionary?

[I'm working on the 'content' column that I want to convert to the correct format first, I tried with the eval() function, but it doesn't work, because there are missing values. This is json data.

My goal is to have the content column data for the keys in the column titles and the values in the cells](https://i.sstatic.net/1CsIl.png)


Solution

  • you can use json.loads in lambda function. if row value is nan, pass, if not, apply json.loads: :

    import json
    import numpy as np
    df['content']=df['content'].apply(lambda x: json.loads(x) if pd.notna(x) else np.nan)
    
    

    now you can use pd.Series.

    v1 = df['Content'].apply(pd.Series)
    df = df.drop(['Content'],axis=1).join(v1)
    
    

    if you have missing values in string dictionaries:

    def check_json(x):
        import ast
        import json
        if pd.isna(x):
            return np.nan
        else:
            try:
                return json.loads(x)
            except:
                try:
                    mask=x.replace('{','').replace('}','') #missing dictionary
                    mask=mask.split(",")
                    for i in range(0,len(mask)):
                        if not len(mask[i].partition(":")[-1]) > 0:
                            print(mask[i])
                            mask[i]=mask[i] + '"None"' # ---> you can replace None with what do you want 
                    return json.loads(str({','.join(mask)}).replace("\'", ""))
                except:
                    try:
                        x=x.replace("\'", "\"")
                        mask=x.replace('{','').replace('}',"") #missing dictionary
                        mask=mask.split(",")
                        for i in range(0,len(mask)):
                            if not len(mask[i].partition(":")[-1]) > 0:
                                print(mask[i])
                                mask[i]=mask[i] + '"None"' # ---> you can replace None with what do you want 
                        b=str({','.join(mask)}).replace("\'", "")
                        return ast.literal_eval(b)
                    except:
                        print("Could not parse json object. Returning nan")
                        return np.nan
    
    df['content']=df['content'].apply(lambda x: check_json(x))
    
    v1 = df['Content'].apply(pd.Series)
    df = df.drop(['Content'],axis=1).join(v1)