Search code examples
pythonpandasdata-analysis

pandas: apply a function to the many columns of a large DataFrame to return multiple rows


Taking the idea from From this answer: pandas: apply function to DataFrame that can return multiple rows

In my case, I have something like this, but larger:

    df = pd.DataFrame({'Restaurants': ['A', 'B', 'C'], 
                       'Tables':['D', 'E', 'F'], 
                       'Chairs': ['G', 'H', 'I'], 
                       'Menus': ['J', 'K', 'L'], 
                       'Fridges': ['M', 'N', 'O'], 
                       'Etc...': ['P', 'Q', 'R'], 'count':[3, 2, 3]})

   Restaurants  Tables Chairs Menus Fridges  Etc... Count
 0           A       D      G     J       M       P     3
 1           B       E      H     K       N       Q     2
 2           C       F      I     L       O       R     3

and I would like to modify this:

def f(group):
    row = group.irow(0)
    return DataFrame({'class': [row['class']] * row['count']})
df.groupby('class', group_keys=False).apply(f)

so I could get

    Restaurants  Tables Chairs Menus Fridges  Etc...
 0           A       D      G     J       M        P
 1           A       D      G     J       M        P           
 2           A       D      G     J       M        P           
 0           B       E      H     K       N        Q           
 1           B       E      H     K       N        Q
 0           C       F      I     L       O        R
 1           C       F      I     L       O        R
 2           C       F      I     L       O        R

Is there an easy way to do it without typing every column's name?


Solution

  • #!/usr/bin/env python
    
    import pandas as pd
    from collections import defaultdict
    
    d = defaultdict(list)
    for n in range(len(df)):
        for c in df.columns.tolist()[:-1]:
            k = [df.ix[n][c]] * df.ix[n]['count']
            for ks in k:
                d[c].append(ks)
        for j in range(df.ix[n]['count']):
            d['index'].append(j)
    
    new_df = pd.DataFrame(d, index=d['index']).drop(['index'], axis = 1)
    new_df
    
            Restaurants  Tables Chairs Menus Fridges  Etc...
     0           A       D      G     J       M        P
     1           A       D      G     J       M        P           
     2           A       D      G     J       M        P           
     0           B       E      H     K       N        Q           
     1           B       E      H     K       N        Q
     0           C       F      I     L       O        R
     1           C       F      I     L       O        R
     2           C       F      I     L       O        R