Search code examples
pythonpandasrate

Error calculating rate of growth conditional on one column


I would like to calculate the rate of growth between two years (1992 and 2000) conditional on a variable (grid100). My goal is to create a new column, called "rate_growth", which gives me for each grid100 the rate of growth of variable "nights"

According to other answers in stackoverflow, I am trying the following:

df['rate_growth']= df.assign(pct_change=df.groupby(['grid100']).nights.pct_change())

However, it gives me the following error: "ValueError: Expected a 1D array, got an array with shape (6868, 4)".

Any idea how can I solve that issue?

This is an image of how my dataframe look like:

enter image description here


Solution

  • If want use DataFrame.assign it assign new column to original DataFrame, so it return all original columns with new assigned one, so cnnot assign to column rate_growth:

    df = df.assign(rate_growth=df.groupby(['grid100']).nights.pct_change())
    

    If want assign to column rate_growth remove assign function:

    df['rate_growth']= df.groupby(['grid100']).nights.pct_change()