Search code examples
pythondataframecolumnsorting

Transform data frame to a different form


This is my data frame.

Date Country Value
1/4/1971 Sweden 5.1643
1/5/1971 Sweden 5.1628
1/6/1971 Sweden 5.1614
1/7/1971 Sweden 5.1649
1/8/1971 Sweden 5.1631
1/4/1971 Canada 1.0109
1/5/1971 Canada 1.0102
1/6/1971 Canada 1.0106
1/7/1971 Canada 1.0148
1/8/1971 Canada 1.0154
1/4/1971 India 8.02
1/5/1971 India 8.00
1/6/1971 India 8.01
1/7/1971 India 8.00
1/8/1971 India 8.03

I want above data frame like bellow using python and panda.

Date Sweden Canada India
1/4/1971 5.1643 1.0109 8.02
1/5/1971 5.1628 1.0102 8
1/6/1971 5.1614 1.0106 8.01
1/7/1971 5.1649 1.0148 8
1/8/1971 5.1631 1.0154 8.03

Please help me. Thank You.


Solution

  • Here we create your dataframe for test..

    import pandas as pd
    
    arr = [['1/4/1971', 'Sweden', '5.1643'],
           ['1/5/1971', 'Sweden', '5.1628'],
           ['1/6/1971', 'Sweden', '5.1614'],
           ['1/7/1971', 'Sweden', '5.1649'],
           ['1/8/1971', 'Sweden', '5.1631'],
           ['1/4/1971', 'Canada', '1.0109'],
           ['1/5/1971', 'Canada', '1.0102'],
           ['1/6/1971', 'Canada', '1.0106'],
           ['1/7/1971', 'Canada', '1.0148'],
           ['1/8/1971', 'Canada', '1.0154'],
           ['1/4/1971', 'India', '8.02'],
           ['1/5/1971', 'India', '8.00'],
           ['1/6/1971', 'India', '8.01'],
           ['1/7/1971', 'India', '8.00'],
           ['1/8/1971', 'India', '8.03']]
    df = pd.DataFrame(arr,columns=['Date','Country','Value'])
    print('old form')
    print(df)
    

    The output should be like:

    old form
            Date Country   Value
    0   1/4/1971  Sweden  5.1643
    1   1/5/1971  Sweden  5.1628
    2   1/6/1971  Sweden  5.1614
    3   1/7/1971  Sweden  5.1649
    4   1/8/1971  Sweden  5.1631
    5   1/4/1971  Canada  1.0109
    6   1/5/1971  Canada  1.0102
    7   1/6/1971  Canada  1.0106
    8   1/7/1971  Canada  1.0148
    9   1/8/1971  Canada  1.0154
    10  1/4/1971   India    8.02
    11  1/5/1971   India    8.00
    12  1/6/1971   India    8.01
    13  1/7/1971   India    8.00
    14  1/8/1971   India    8.03
    

    Let's do our magic:

    note: this code not optimized but works well

    table = {}
    for row in df.values:
        date = row[0]
        country = row[1]
        value = row[2]
        if date not in table:table[date] = {country:value}
        else:table[date][country] = value
    
    arr = []
    for date in table.keys():
        row = table[date]
        row = [date,row['Sweden'],row['Canada'],row['India']]
        arr.append(row)
    
    df2 = pd.DataFrame(arr,columns=['Date','Sweden','Canada','India'])
    print('new form')
    print(df2)    
    

    The final output should be

    new form
           Date  Sweden  Canada India
    0  1/4/1971  5.1643  1.0109  8.02
    1  1/5/1971  5.1628  1.0102  8.00
    2  1/6/1971  5.1614  1.0106  8.01
    3  1/7/1971  5.1649  1.0148  8.00
    4  1/8/1971  5.1631  1.0154  8.03