Search code examples
pythonpandasdataframeanacondaunpivot

Data format using python?


I am fairly new to data field, I have problem like this, here is my dataframe

------------------------------------------------------
ErrorCD    ID    Freq1      Freq2     Freq3....
------------------------------------------------------
1          A      2          3           2
2          B      1          2           2
3          C      1          3           3

And I want it be like this:

---------------------
ErrorCD  ID    Freq
---------------------
1        A      2
2        A      3
3        A      2
.....

How can do this in using python?


Solution

  • You want to stack

    df.set_index(['ErrorCD', 'ID']).stack().reset_index(name='Freq')
    
       ErrorCD ID level_2  Freq
    0        1  A   Freq1     2
    1        1  A   Freq2     3
    2        1  A   Freq3     2
    3        2  B   Freq1     1
    4        2  B   Freq2     2
    5        2  B   Freq3     2
    6        3  C   Freq1     1
    7        3  C   Freq2     3
    8        3  C   Freq3     3
    

    We can get rid of the column of FreqX with

    df.set_index(['ErrorCD', 'ID']).stack().reset_index(name='Freq').drop('level_2', 1)
    
       ErrorCD ID  Freq
    0        1  A     2
    1        1  A     3
    2        1  A     2
    3        2  B     1
    4        2  B     2
    5        2  B     2
    6        3  C     1
    7        3  C     3
    8        3  C     3
    

    Another approach, rebuilding

    f = df.filter(regex='^Freq')
    m = f.shape[1]
    pd.DataFrame(dict(
        ErrorCD=df.ErrorCD.values.repeat(m),
        ID=df.ID.values.repeat(m),
        Freq=f.values.ravel()
    ))
    
       ErrorCD  Freq ID
    0        1     2  A
    1        1     3  A
    2        1     2  A
    3        2     1  B
    4        2     2  B
    5        2     2  B
    6        3     1  C
    7        3     3  C
    8        3     3  C
    

    You can also use pd.DataFrame.melt

    df.melt(['ErrorCD', 'ID'], value_name='Freq').drop('variable', 1)
    
       ErrorCD ID  Freq
    0        1  A     2
    1        2  B     1
    2        3  C     1
    3        1  A     3
    4        2  B     2
    5        3  C     3
    6        1  A     2
    7        2  B     2
    8        3  C     3