I have data as follows:
country state area people
India Tamil Nadu urban 194
India Karnataka urban 96
Sri Lanka state1 urban 600
India Chennai urban 302
Sri Lanka state2 urban 213
India Bengaluru rural 34
Sri Lanka state3 rural 173
U.S.A NYC urban 300
Germany Scehltr rural 87
India west Bengal urban 902
Sri Lanka State 9 rural 102
And I want to group this data as here:
country state area people
India Tamil Nadu urban 194
Karnataka urban 96
Bengaluru rural 34
Chennai urban 302
Sri Lanka state1 urban 600
state2 urban 213
state3 rural 173
And I want to insert this data into a csv file in the same exact format, such that when the csv file is opened, I would get the data as in this above output.
I will be using Python Pandas for this task.
Filter only duplicated values by country
column and then sorting by same column, last repalce duplicated values to empty strings:
df = df[df.duplicated(['country'], keep=False)].sort_values('country', ignore_index=True)
df.loc[df.duplicated(subset=['country']), 'country'] = ''
print (df)
country state area people
0 India Tamil Nadu urban 194
1 Karnataka urban 96
2 Chennai urban 302
3 Bengaluru rural 34
4 west Bengal urban 902
5 Sri Lanka state1 urban 600
6 state2 urban 213
7 state3 rural 173
8 State 9 rural 102
If use MultiIndex in file are values repeated:
df = df.set_index(['country','state'])
print (df)
area people
country state
India Tamil Nadu urban 194
Karnataka urban 96
Chennai urban 302
Bengaluru rural 34
west Bengal urban 902
Sri Lanka state1 urban 600
state2 urban 213
state3 rural 173
State 9 rural 102