Search code examples
pythonpandasstatisticsstandard-deviation

Calculate standard deviation for intervals in dataframe column


I would like to calculate standard deviations for non rolling intervals.

I have a df like this:

value std year
  3   nan 2001
  2   nan 2001
  4   nan 2001
 19   nan 2002
 23   nan 2002
 34   nan 2002

and so on. I would just like to calculate the standard deviation for every year and save it in every cell in the respective row in "std". I have the same amount of data for every year, thus the length of the intervals never changes.

I already tried:

df["std"] = df.groupby("year").std()

but since the right gives a new dataframe that calculates the std for every column gouped by year this obviously does not work.

Thank you all very much for your support!


Solution

  • IIUC:

    try via transform() method:

    df['std']=df.groupby("year")['value'].transform('std')
    

    OR

    If you want to find the standard deviation of multiple columns then:

    df[['std1','std2']]=df.groupby("year")[['column1','column2']].transform('std')