Search code examples
pythonpandas

How to create a summary table from a dictionary of lists with different len?


My input is this dict :

response = {
    'A': ['CATEGORY 2'],
    'B': ['CATEGORY 1', 'CATEGORY 2'],
    'C': [],
    'D': ['CATEGORY 3'],
}

And I'm trying to make this kind of dataframe :

|  ITEM  |  CATEGORY 1  |  CATEGORY 2  |  CATEGORY 3  |
|   A    |              |      x       |              |
|   B    |      x       |      x       |              |
|   C    |              |              |              |
|   D    |              |              |      x       |

For that I made the code below but the result was extremely unexpected.

df = pd.DataFrame.from_dict(response, orient='index').fillna('x')

df = df.reset_index()

df = df.rename(columns={'index': 'ITEM'})

print(df)

  ITEM           0           1
0    A  CATEGORY 2           x
1    B  CATEGORY 1  CATEGORY 2
2    C           x           x
3    D  CATEGORY 3           x

Do you guys have a solution for that ? I'm open to any suggestion.


Solution

  • Using str.get_dummies:

    out = (pd.Series(map('|'.join, response.values()), index=response.keys())
             .rename_axis('ITEM')
             .str.get_dummies()
             #.replace({0: '', 1: 'x'}) # uncomment for Xs
             .reset_index()
          )
    

    Or, using a dictionary comprehension:

    out = (pd.DataFrame
      .from_dict({k: {v: 'X' for v in l} for k,l in response.items()},
                 orient='index')
      .rename_axis('ITEM').reset_index()
    ) 
    

    Output:

      ITEM  CATEGORY 1  CATEGORY 2  CATEGORY 3
    0    A           0           1           0
    1    B           1           1           0
    2    C           0           0           0
    3    D           0           0           1
    

    With Xs:

      ITEM CATEGORY 1 CATEGORY 2 CATEGORY 3
    0    A                     x           
    1    B          x          x           
    2    C                                 
    3    D                                x