Search code examples
pythonpandas

How to count the occurrences of a value in a data frame?


I have a data frame [df] that looks like this but much lagger:

title of the novel                author          publishing year   mentionned cities   
0   Beasts and creatures        Bruno Ivory             2021           New York 
0   Monsters                    Renata Mcniar           2023           New York 
0   At risk                     Charles Dobi            2020           London   
0   Manuela and Ricardo         Lucas Zacci             2022           Rio de Janeiro
0   War against the machine     Angelina Trotter        1999           Rio de Janeiro

I would like to add another column with the objective of counting all the occurences of the cities. The problem is that I want to maintain the year of that occurrence, as I work with history. In other words, it is important for me to be able to know when the city was mentionned.

The expected outcome would look like this:

title of the novel     author    publishing year    mentionned cities       Counter
0   Beasts and creatures        Bruno Ivory             2021           New York   1 
0   Monsters                    Renata Mcniar           2022           New York   2
0   At risk                     Charles Dobi            2020           London     1
0   Manuela and Ricardo         Lucas Zacci             2022           Rio de Janeiro 1
0   War against the machine     Angelina Trotter        1999           Rio de Janeiro 2

So far, I have just managed to count all the occurrences, but I could not relate it to the publishing years. The code I am using is:

df ['New York'] = df.eq('New York').sum().to_frame().T

Can someone help me, please?

edit:

I tried joining two dataframes and I got something interesting but not what I really wanted. The problem is that it does not keep the Publishing year on track.

    d[f'counter'] = d.fgroupby('mentionned cities')['mentionned cities'].transform('counter')

result = pd.concat([df['New York'], df], axis=1, join='inner')
    display(result)

Output:

title of the novel     author    publishing year    mentionned cities       Counter
    0   Beasts and creatures        Bruno Ivory             2021           New York   2 
    0   Monsters                    Renata Mcniar           2023           New York   2
    0   At risk                     Charles Dobi            2020           London     1
    0   Manuela and Ricardo         Lucas Zacci             2022           Rio de Janeiro 1
    0   War against the machine     Angelina Trotter        1999           Rio de Janeiro 2

The problem still lingers on


Solution

  • Perhaps you could use a for loop to iterate through the 'mentioned cities' column, and use a dict to count the occurrences of cities:

    city_count = {}
    count_column = []
    for city in df['mentioned cities']:
        city_count[city] = city_count.get(city, 0) + 1
        count_column.append(city_count[city])
    
    df['Counter'] = count_column