Search code examples
pythonpandasdataframerow

Add new values into a row below or a newly created row below based on conditions


Consider this simplified df:

import pandas as pd
data = { 'Name_Type': ["Primary", "Primary", "AKA", "Primary"],
         'Name': ["John", "Daniel", "Dan", "Bob"],
         'Surname': ["Green", "Brown", "Brown", "White"],
         'Country Type': ["Origin", "Origin", None, "Origin"],
         'Country': ["UK", "UK", None, "UK"],
         'Other': ["Info", None, None, "Info"]}
df = pd.DataFrame(data)
  Name_Type    Name Surname Country Type Country Other
0   Primary    John   Green       Origin      UK  Info
1   Primary  Daniel   Brown       Origin      UK  None
2       AKA     Dan   Brown         None    None  None
3   Primary     Bob   White       Origin      UK  Info

So I want to add new values under each row that has Origin not None. If there is already a generated row with None under (like in row 2 in example), I want to add value "Citizenship" into Country Type column and value "UK" into Country of this row. If there isn't a row, I want to create a new row under the current one and add same values. So the final output will be like this:

  Name_Type    Name Surname Country Type Country Other
0   Primary    John   Green       Origin      UK  Info
1      None    None    None  Citizenship      UK  None
2   Primary  Daniel   Brown       Origin      UK  None
3       AKA  Daniel   Brown  Citizenship      UK  None
4   Primary     Bob   White       Origin      UK  Info
5      None    None    None  Citizenship      UK  None

Solution

  • You can use a boolean mask to identify the rows to duplicate, then concat the different part and restore the order with sort_index:

    # identify rows to duplicate
    m = df['Country Type'].isna()
    m2 = ~(m | m.shift(-1))
    
    # values to fill
    fill = {'Country Type': 'Citizenship',
            'Country': 'UK'}
    
    # concatenation
    out = pd.concat([df[m2],
                     df.loc[m2, ['Country']]
                       .assign(**fill),
                     df[~m2].fillna(fill)
                    ]).sort_index(kind='stable')
    

    Output:

      Name_Type    Name Surname Country Type Country Other
    0   Primary    John   Green       Origin      UK  Info
    0       NaN     NaN     NaN  Citizenship      UK   NaN
    1   Primary  Daniel   Brown       Origin      UK  None
    2       AKA     Dan   Brown  Citizenship      UK  None
    3   Primary     Bob   White       Origin      UK  Info
    3       NaN     NaN     NaN  Citizenship      UK   NaN