Compute numeric value using string response by evaluating numeric and string values in logical function

I am trying to write a function that takes the string response in column "Frequency" and calculates the total number of days per year someone drank alcohol.

The three main values I am trying to take from the string are the numbers that exist in the statement and the word (week, month, year) to calculate the average total number of days someone drank within a year. For example, if someone drank 2-3 times per month the equation would be (2+3/2)*12 = 30 times per year. The data table below shows a sample of the data.

Frequency
1 day per month
3 days per week
1 to 2 days per year
2 days per week
1 day per month
6-11 days per year
5-6 days a week

The table I am trying to produce would have the average days per year as seen below:

Frequency per year
12
156
1.5
104
12
8.5
286

So far I have written the code below:

import pandas as pd 
AlcData = pd.read_excel('Alcohol_Data.xlsx')

#add new column with unittime value for use in function 
AlcData['unittime'] = AlcData.Frequency.str.extract(r'\b(\w+)$', 
                                     expand = True)

def calculatetotaldays(row):
    for x in range(1,11):
#read in row item as string value 
        string = AlcData.Frequency
# create list of number values from the string
        numbers = [int(i) for i in string.split() if i.isdigit()]
 #compute total days if list has length of 1 
    if len(numbers) == 1:
         x = [numbers[j] for j in (0)]
    if row[AlcData.unittime] == 'week': 
        total = x*52
    elif row[AlcData.unittime] == 'month':
        total = x*12
    elif row[AlcData.unittime] == 'year': 
        total = x
#compute total days if list has length of 2 
    if len(numbers) == 2:
        x, y = [numbers[j] for j in (0, 1)]
    if row[AlcData.unittime] == 'week': 
        total = (((x+y)/2)*52)
    elif row[AlcData.unittime] == 'month': 
        total = (((x+y)/2)*12)
    elif row[AlcData.unittime] == 'year': 
        total = ((x+y)/2)
    return total

AlcData['totalperyear'] = AlcData.apply(calculatetotaldays, axis=1)

I am currently getting the error: "'Series' object has no attribute 'split'" while trying to extract numbers across the rows into lists. Does anyone know how to correct this error within the function? More importantly, is this approach (using the length of the list to assign these variables and calculate numbers) is the best way to solve this problem?

I've been struggling with this for a long time so any and all tips on how to compute this information would be extremely helpful.

Solution

This is largely a rewrite, but here's a way to go about it with just pandas:

In [92]: (
    ...:     df['Frequency']
    ...:     .str.split()
    ...:     .str[-1].map({'week': 52, 'month': 12, 'year': 1})
    ...:     .mul(
    ...:         df['Frequency']
    ...:         .str.extract(r'(\d+)\D*(\d+)?')
    ...:         .ffill(axis=1)
    ...:         .astype(int).mean(axis=1)
    ...:     )
    ...: )
Out[92]:
0     12.0
1    156.0
2      1.5
3    104.0
4     12.0
5      8.5
6    286.0
dtype: float64

We can break it into a calculation of the numerical parts and then a multiplier. You can get the numbers out with a regex similar to what you were doing before:

In [89]: df['Frequency'].str.extract(r'(\d+)\D*(\d+)?')
Out[89]:
   0    1
0  1  NaN
1  3  NaN
2  1    2
3  2  NaN
4  1  NaN
5  6   11
6  5    6

and from there, you can use .ffill() and .mean() to turn it into one number:

In [90]: df['Frequency'].str.extract(r'(\d+)\D*(\d+)?').ffill(axis=1).astype(int).mean(axis=1)
Out[90]:
0    1.0
1    3.0
2    1.5
3    2.0
4    1.0
5    8.5
6    5.5
dtype: float64

If you know it always ends in year, month, or week, you can just pick that off and use map for the multiplier:

In [91]: df['Frequency'].str.split().str[-1].map({'week': 52, 'month': 12, 'year': 1})
Out[91]:
0    12
1    52
2     1
3    52
4    12
5     1
6    52
Name: Frequency, dtype: int64

and then you can multiply them as I did above.