Search code examples
pythonpandasdataframedata-munging

How to efficiently rearrange pandas data as follows?


I need some help with a concise and first of all efficient formulation in pandas of the following operation:

Given a data frame of the format

id    a   b    c   d
1     0   -1   1   1
42    0    1   0   0
128   1   -1   0   1

Construct a data frame of the format:

id     one_entries
1      "c d"
42     "b"
128    "a d"

That is, the column "one_entries" contains the concatenated names of the columns for which the entry in the original frame is 1.


Solution

  • Here's one way using boolean rule and applying lambda func.

    In [58]: df
    Out[58]:
        id  a  b  c  d
    0    1  0 -1  1  1
    1   42  0  1  0  0
    2  128  1 -1  0  1
    
    In [59]: cols = list('abcd')
    
    In [60]: (df[cols] > 0).apply(lambda x: ' '.join(x[x].index), axis=1)
    Out[60]:
    0    c d
    1      b
    2    a d
    dtype: object
    

    You can assign the result to df['one_entries'] =

    Details of apply func.

    Take first row.

    In [83]: x = df[cols].ix[0] > 0
    
    In [84]: x
    Out[84]:
    a    False
    b    False
    c     True
    d     True
    Name: 0, dtype: bool
    

    x gives you Boolean values for the row, values greater than zero. x[x] will return only True. Essentially a series with column names as index.

    In [85]: x[x]
    Out[85]:
    c    True
    d    True
    Name: 0, dtype: bool
    

    x[x].index gives you the column names.

    In [86]: x[x].index
    Out[86]: Index([u'c', u'd'], dtype='object')