Search code examples
pythonpandasdictionarylambdakey-value

Dictionary sometimes featuring with multiple keys


I have a pandas data frame, and I would like to make a new column/s based on the dictionary values.

Here is my df and dictionary:

data = ['One', 'Two', 'Three', 'Four']

df = pd.DataFrame(data, columns=['Count'])

dictionary = {'One':'Red', 'Two':['Red', 'Blue'], 'Three':'Green','Four':['Green','Red', 'Blue']}

This is the result I would like to achieve,

enter image description here

Preferably with blank fields instead of None values, does anybody know a way?

I tried the below:

df = df = pd.DataFrame([(k, *v) for k, v in dictionary.items()])
df.columns = ['name'] + [f'n{x}' for x in df.columns[1:]]
df

However, for keys that do not have multiple values, it seems to split the actual string per letter over the columns like so: enter image description here

A solution where it maps the values to one columns separated with a delimiter (,) would also be helpful.


Solution

  • There are lists in values of dictionary, so need if-else statement for prevent unpack strings by *:

    df = pd.DataFrame([(k, *v) 
                       if isinstance(v, list) 
                       else (k, v) for k, v in dictionary.items()])
    df.columns = ['name'] + [f'n{x}' for x in df.columns[1:]]
    print (df)
        name     n1    n2    n3
    0    One    Red  None  None
    1    Two    Red  Blue  None
    2  Three  Green  None  None
    3   Four  Green   Red  Blue
    

    Details:

    print (((*'Red', )))
    ('R', 'e', 'd')
    
    print (((*['Red', 'Blue'], )))
    ('Red', 'Blue')
    

    Preferably with blank fields instead of None values, does anybody know a way?

    Add DataFrame.fillna:

    df = pd.DataFrame([(k, *v) 
                       if isinstance(v, list) 
                       else (k, v) 
                       for k, v in dictionary.items()]).fillna('')
    df.columns = ['name'] + [f'n{x}' for x in df.columns[1:]]
    print (df)
        name     n1    n2    n3
    0    One    Red            
    1    Two    Red  Blue      
    2  Three  Green            
    3   Four  Green   Red  Blue
    

    If same indices and number of rows use DataFrame.join:

    If original df_orig has different like default RangeIndex add index=df_orig.index to DataFrame constructor:

    df = pd.DataFrame([(k, *v) 
                       if isinstance(v, list) 
                       else (k, v) 
                       for k, v in dictionary.items()], index=df_orig.index).fillna('')
    df.columns = ['name'] + [f'n{x}' for x in df.columns[1:]]
    
    df = df_orig.join(df)
    

    If need merge by name column by left join in DataFrame.merge:

    df = df_orig.merge(df, on='name', how='left')