Search code examples
pandasdataframenumpytransform

DataFrame transformation: from multiple columns to one


How can I transform with Pandas & NumPy this DataFrame:

initial table

to DataFrame like:

        Name      Year     Nb 
------+--------+--------+-------
0     |  A     |   2021 |  5.0
1     |  A     |   2020 |  4.0
2     |  A     |   2019 | 10.0
3     |  A     |   2018 |  4.0
4     |  A     |   2017 |  4.0
...
k     |  A-Jay |   2021 |  5.0
k+1   |  A-Jay |   2020 |  6.0
...
l+i   |  A.J.  |   2019 |  3.0
m     |  Aaban |   2021 |  4.0
m+1   |  Aaban |   2020 |  4.0
...

?


Solution

  • Here's a way (probably the most elegant) using melt():

    out = ( df
        .melt(id_vars='Name', var_name='Year', value_name='Nb')
        .dropna()
        .sort_values(['Name','Year'], ascending=[True,False])
        .reset_index(drop=True) )
    

    Here's another way, this one using stack():

    out = ( df
        .set_index('Name')
        .stack()
        .reset_index()
        .rename(columns={'level_1':'Year',0:'Nb'})
        .sort_values(['Name','Year'],ascending=[True,False])
        .reset_index(drop=True) )
    

    Sample input:

      Name  2021  2022  2023  2024
    0    a   NaN   4.0  None   0.0
    1    b   2.0   NaN  None   NaN
    2    c   3.0   6.0  None   0.0
    

    Output:

      Name  Year   Nb
    0    a  2024  0.0
    1    a  2022  4.0
    2    b  2021  2.0
    3    c  2024  0.0
    4    c  2022  6.0
    5    c  2021  3.0