Search code examples
pythonduplicatestranspose

Python: Transpose columns & duplicate rows


I am a complete newbie with Python and I am trying to solve the following problem...

I have read in an excel file with pandas that looks something like this:

Country 2000 2001 2002 2003 2004 ...
Argentina 1.23 4.56 7.89 10.11 12.13 ...
Brazil 14.15 16.17 18.19 20.21 22.23 ...
Chile ... ... ... ... ... ...
Colombia ... ... ... ... ... ...

And what I try to get is something like this:

Country Year Value
Argentina 2000 1.23
Argentina 2001 4.56
Argentina 2002 7.89
Argentina 2003 10.11
Argentina 2004 12.13
... ... ...
Brazil 2000 14.15
Brazil 2001 16.17
Brazil 2002 18.19
Brazil 2003 20.21
Brazil 2004 22.23
... ... ...
Chile 2000 ...
... ... ...

I have already read a few posts about transposing, but have not found anything that addresses my problem.

Can someone help here?


Solution

  • pandas.melt is useful for you to unpivot a dataframe.

    https://pandas.pydata.org/docs/reference/api/pandas.melt.html

    So, you can transform the dataframe to what you want as follows:

    Code

    >>> import pandas as pd
    >>> pd.__version__
    '1.2.4'
    >>> df = pd.DataFrame({
    ...     "Country": ["Argentina", "Brazil",],
    ...     "2000": [1.23, 14.15,],
    ...     "2001": [4.56, 16.17,], 
    ...     "2002": [7.89, 18.19,],
    ...     "2003": [10.11, 20.21,],
    ...     "2004": [12.13, 22.23,],
    ... })  # Create sample data
    
    >>> df
         Country   2000   2001   2002   2003   2004
    0  Argentina   1.23   4.56   7.89  10.11  12.13
    1     Brazil  14.15  16.17  18.19  20.21  22.23
    
    >>> df.melt(id_vars="Country", var_name="Year", value_name="Value")
         Country  Year  Value
    0  Argentina  2000   1.23
    1     Brazil  2000  14.15
    2  Argentina  2001   4.56
    3     Brazil  2001  16.17
    4  Argentina  2002   7.89
    5     Brazil  2002  18.19
    6  Argentina  2003  10.11
    7     Brazil  2003  20.21
    8  Argentina  2004  12.13
    9     Brazil  2004  22.23
    
    >>> # If you want to sort the records by "Country":
    >>> df.melt(id_vars="Country", var_name="Year", value_name="Value").sort_values("Country").reset_index(drop=True)
         Country  Year  Value
    0  Argentina  2000   1.23
    1  Argentina  2001   4.56
    2  Argentina  2002   7.89
    3  Argentina  2003  10.11
    4  Argentina  2004  12.13
    5     Brazil  2000  14.15
    6     Brazil  2001  16.17
    7     Brazil  2002  18.19
    8     Brazil  2003  20.21
    9     Brazil  2004  22.23
    

    Thanks :)