The following code keeps returning an AttributeError in vs code but the same code when run on Google Colab, produces no such error:
The code:
import numpy as np
import pandas as pd
url = 'https://github.com//mattharrison/datasets/raw/master/data/alta-noaa-1980-2019.csv'
alta_df = pd.read_csv(url)
dates = pd.to_datetime(alta_df.DATE)
snow = alta_df.SNOW.rename(dates)
def season(idx):
year = idx.year
month = idx.month
return year.where((month<10), year+1)
snow.groupby(season).sum()
The Error:
AttributeError Traceback (most recent call last)
File
388 year = idx.year
389 month = idx.month
--> 390 return year.where((month<10), year+1)
AttributeError: 'int' object has no attribute 'where'
My understanding is that since I am calling the season() function as param for the chained groupby function, where() function should have been able to get the year from the snow object. But somehow that is not happening.
Just to make sure that there is no syntax error in my code, ran this code on Google Colab and there I did not face any such issues. I have attached an screenshot of the output from the Google Colab for your perusal:
I have also gone through all the available solutions for AttributeError on this platform but could not find any solutions where this error was restricted only to VS Code and not to the Google Colab or Juputer Notebook terminal.
When groupby
takes a function, it calls it on each value, this is not vectorized.
by: mapping, function, label, pd.Grouper or list of such
Used to determine the groups for the groupby. If by is a function, it’s called on each value of the object’s index. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups (the Series’ values are first aligned; see .align() method). If a list or ndarray of length equal to the selected axis is passed (see the groupby user guide), the values are used as-is to determine the groups. A label or list of labels may be passed to group by the columns in self. Notice that a tuple is interpreted as a (single) key.
You can use instead:
snow.groupby(season(snow.index)).sum()
Or make your function non-vectorized:
def season(idx):
year = idx.year
month = idx.month
return year if month<10 else year+1
snow.groupby(season).sum()
Output:
1980 457.5
1981 503.0
1982 842.5
1983 807.5
...
2017 524.0
2018 308.8
2019 504.5
Name: SNOW, dtype: float64
Alternatively, resample
:
snow.resample('Y-SEP').sum()
1980-09-30 457.5
1981-09-30 503.0
1982-09-30 842.5
...
2017-09-30 524.0
2018-09-30 308.8
2019-09-30 504.5
Freq: YE-SEP, Name: SNOW, dtype: float64