Search code examples
pandasdataframeconditional-statementsrows

How to group rows based on a condition in a dataframe with python pandas?


I want to change the range of ages (EDAT) so that the two first range of ages now 0 to 9 and 10 to 19 stay as a single age range from 0 to 19 without changing the other values.

df = pd.DataFrame({'DATA': ['2021-10-10',
 '2021-10-10',
 '2021-10-10',
 '2021-10-10',
 '2021-10-10',
 '2021-10-10',
 '2021-10-10',
 '2021-10-10',
 '2021-10-10',
 '2021-10-10'], 'EDAT':['0 a 9',
 '10 a 19',
 '10 a 19',
 '20 a 29',
 '20 a 29',
 '20 a 29',
 '30 a 39',
 '30 a 39',
 '30 a 39',
 '30 a 39'], 'ESDEVENIMENT': ['Cas',
 'Cas',
 'Cas',
 'Cas',
 'Cas',
 'Hospitalització',
 'Cas',
 'Cas',
 'Cas',
 'Hospitalització'], 'PAUTA': ['No iniciada',
 'Completa',
 'No iniciada',
 'Completa',
 'No iniciada',
 'No iniciada',
 'Completa',
 'No iniciada',
 'Parcial',
 'No iniciada'], 'RECOMPTE': [6,
 5,
 6,
 3,
 4,
 2,
 7,
 10,
 1,
 2]})

Solution

  • You should read Working with text data.

    Use str.replace:

    df['EDAT'] = df['EDAT'].str.replace(r'(0 a 9|10 a 19)', '0 a 19', regex=True)
    print(df)
    
    # Output
             DATA     EDAT     ESDEVENIMENT        PAUTA  RECOMPTE
    0  2021-10-10   0 a 19              Cas  No iniciada         6
    1  2021-10-10   0 a 19              Cas     Completa         5
    2  2021-10-10   0 a 19              Cas  No iniciada         6
    3  2021-10-10  20 a 29              Cas     Completa         3
    4  2021-10-10  20 a 29              Cas  No iniciada         4
    5  2021-10-10  20 a 29  Hospitalització  No iniciada         2
    6  2021-10-10  30 a 39              Cas     Completa         7
    7  2021-10-10  30 a 39              Cas  No iniciada        10
    8  2021-10-10  30 a 39              Cas      Parcial         1
    9  2021-10-10  30 a 39  Hospitalització  No iniciada         2