This is the dataframe I have:
chr value region
chr22 1 21-77
chr6 3 12-65
chr3 5 73-81
chr3 8 91-96
And this is what I need:
21-77 12-65 73-81 91-96
chr22 1 0 0 0
chr6 0 3 0 0
chr3 0 0 5 8
Please note that the first column of the initial dataframe contains duplicate values. (such as chr3)
Could you tell me how I can achieve this please. Thanks in advance.
Looks like the perfect application for pandas pivot_table.
Worth highlighting that pivot_table uses numpy mean as aggregation function (in case there are multiple observations with same index & column. So it implicitly requires numbers (int/floats) as values by default.
Let frame
be the pandas dataframe containing your data:
import pandas as pd
cc = ['chr', 'value', 'region']
vals = [['chr22', 1, '21-77'],
['chr6', 3, '12-65'],
['chr3', 5, '73-81'],
['chr3', 8, '91-96']]
frame = pd.DataFrame(vals, columns = cc)
result = pd.pivot_table(frame,
values = 'value', index = ['chr'], columns = ['region'],
fill_value = 0)