Search code examples
pythonpandasdataframedata-sciencemultiple-columns

Create new column that counts each string match


I aim to create a new column based on multiple string matches. This new column will return a 1 number for every match on the event column.

        Date        Event
0 2022-11-01        Breakfast
1 2022-11-01        Breakfast
2 2022-11-01        Lunch
3 2022-11-02        Breakfast
4 2022-11-02        Lunch

After

        Date        Event        Breakfast        Lunch
0 2022-11-01        Breakfast        1            null
1 2022-11-01        Breakfast        1            null
2 2022-11-01        Lunch            null         1
3 2022-11-02        Breakfast        1            null
4 2022-11-02        Lunch            null         1

Solution

  • You can use pd.get_dummies for the task:

    out = pd.concat([df, pd.get_dummies(df['Event'])], axis=1)
    print(out)
    

    Prints:

             Date      Event  Breakfast  Lunch
    0  2022-11-01  Breakfast          1      0
    1  2022-11-01  Breakfast          1      0
    2  2022-11-01      Lunch          0      1
    3  2022-11-02  Breakfast          1      0
    4  2022-11-02      Lunch          0      1
    

    If you want None instead 0:

    out = out.replace({0: None})
    print(out)
    

    Prints:

             Date      Event Breakfast Lunch
    0  2022-11-01  Breakfast         1  None
    1  2022-11-01  Breakfast         1  None
    2  2022-11-01      Lunch      None     1
    3  2022-11-02  Breakfast         1  None
    4  2022-11-02      Lunch      None     1