Consider this simplified df:
import pandas as pd
data = { 'Name_Type': ["Primary", "Primary", "AKA", "Primary"],
'Name': ["John", "Daniel", "Dan", "Bob"],
'Surname': ["Green", "Brown", "Brown", "White"],
'Country Type': ["Origin", "Origin", None, "Origin"],
'Country': ["UK", "UK", None, "UK"],
'Other': ["Info", None, None, "Info"]}
df = pd.DataFrame(data)
Name_Type Name Surname Country Type Country Other
0 Primary John Green Origin UK Info
1 Primary Daniel Brown Origin UK None
2 AKA Dan Brown None None None
3 Primary Bob White Origin UK Info
So I want to add new values under each row that has Origin not None. If there is already a generated row with None under (like in row 2 in example), I want to add value "Citizenship" into Country Type column and value "UK" into Country of this row. If there isn't a row, I want to create a new row under the current one and add same values. So the final output will be like this:
Name_Type Name Surname Country Type Country Other
0 Primary John Green Origin UK Info
1 None None None Citizenship UK None
2 Primary Daniel Brown Origin UK None
3 AKA Daniel Brown Citizenship UK None
4 Primary Bob White Origin UK Info
5 None None None Citizenship UK None
You can use a boolean mask to identify the rows to duplicate, then concat
the different part and restore the order with sort_index
:
# identify rows to duplicate
m = df['Country Type'].isna()
m2 = ~(m | m.shift(-1))
# values to fill
fill = {'Country Type': 'Citizenship',
'Country': 'UK'}
# concatenation
out = pd.concat([df[m2],
df.loc[m2, ['Country']]
.assign(**fill),
df[~m2].fillna(fill)
]).sort_index(kind='stable')
Output:
Name_Type Name Surname Country Type Country Other
0 Primary John Green Origin UK Info
0 NaN NaN NaN Citizenship UK NaN
1 Primary Daniel Brown Origin UK None
2 AKA Dan Brown Citizenship UK None
3 Primary Bob White Origin UK Info
3 NaN NaN NaN Citizenship UK NaN