Search code examples
python-3.xpandasglobal-variablesapply

Pandas .apply() function not always being called in python 3


Hello I wanted to increment a global variable 'count' through a function which will be called on a pandas dataframe of length 1458.

I have read other answers where they talk about .apply() not being inplace. I therefore follow their advice but the count variable still is 4

count = 0
def cc(x):
   global count
   count += 1
   print(count) 

#Expected final value of count is 1458 but instead it is 4
# I think its 4, because 'PoolQC' is a categorical column with 4 possible values
# I want the count variable to be 1458 by the end instead it shows 4


all_data['tempo'] = all_data['PoolQC'].apply(cc)

# prints 4 instead of 1458
print("Count final value is ",count)

Solution

  • Yes, the observed effect is because you have categorical type of the column. This is smart of pandas that it just calculates apply for each category. Is counting only thing you're doing there? I guess not, but why you need such a calculation? Can't you use df.shape?

    Couple of options I see here:

    1. You can change type of column e.g.

    all_data['tempo'] = all_data['PoolQC'].astype(str).apply(cc)

    1. You can use different non-categorical column

    2. You can use df.shape to see how many rows you have in the df.

    3. You can use apply for whole DataFrame like all_data['tempo'] = df.apply(cc, axis=1). In such a case you still can use whatever is in all_data['PoolQC'] within cc function, like:

    def cc(x): global count count += 1 print(count) return x['PoolQC']