I have a function that works fine with individual values, but when I use it with pandas series.apply(), it gives an OverflowError.
from __future__ import division
import pandas as pd
import numpy as np
birthdays = pd.DataFrame(np.empty([365,2]), columns = ['k','probability'], index = range(1,366))
birthdays['k'] = birthdays.index
I make a function:
def probability_of_shared_bday(k):
end_point = 366 - k
numerator = 1
for i in range(end_point, 366):
numerator = numerator*i
denominator = 365**k
probability_of_no_match = (1 - numerator/denominator)
return probability_of_no_match
when I try this out with individual integers, it works fine:
probability_of_shared_bday(1)
0.0
probability_of_shared_bday(100)
0.9999996927510721
But when I try and use this function with apply:
birthdays['probability'] = birthdays['k'].apply(probability_of_shared_bday, convert_dtype=False)
OverflowError: integer division result too large for a float
This happens regardless of if convert_dtype
is True or False.
Checking birthdays['k'].dtypes
I get dtype('int64')
I'm not sure why you have this problem with apply
, but you should not write the function like you did in the first place. Here is a suggestion that avoids dividing two huge numbers one by another:
def probability_of_shared_bday(k):
end_point = 366 - k
ratio = 1
for i in range(end_point, 366):
ratio *= i / 365
probability_of_no_match = (1 - ratio)
return probability_of_no_match
And the problem goes away!