Search code examples
pythonpandascsvdata-manipulationdata-cleaning

Concatenating multiple CSV files based on column values,but the multiple csv files have the same header but vary in order


I'm cleaning some data.I have data from multiple subjects for multiple subjects over multiple trails.

SubNo Trails Score 
1       1      4
1       2      4
1       3      8
7       1      9
7       2      8
7       3      8
19
:
:

For the same subject, I have another dataset for indifferent order for SubNo

SubNo Trails Height 
19      1      100
19      2      400
19      3      810
7       1      911
7       2      811
7       3      811
20      1      222
20      2      222
20      3      789
1
1
:
:

I want to join these two on SubNo, such that in the end I have one CSV per subject for both score and height.

SubNo Trails Score Height 
1        1     4     198
1        2     4     209
1        3     8     289
2        1     :      :
2        2
2        3

Here, I have joined the same data based on subNo: So,all the values of 1 together,all values of subject 2 together and so on.In my two csv files the order of subject is not the same. So,I don't want to join based on header,but based on specific subject number.In my case ,that is 1,2,17,...like that. How should I go about it? (I tried pandas merge,it works based on header).That's won't do what I want.


Solution

  • Okey,so the solution I found was to sort each csv file on subnum and concatenate.

    df1.sort_values(by=['Subnum','Trials'], ascending=True)
    df2..sort_values(by=['Subnum','Trials'], ascending=True)
    pd.concat([df1,df2],axis=1)