x y
1.2 3.1
1.4 3.5
1.5 3.2
2.2 3.6
2.2 2.8
2.3 3.3
2.4 3.5
2.5 3.8
2.7 3.4
2.8 3.3
Say i have the dataframe above, and I wish to write a function
def ave(pd,minx,maxx):
which calculates the average of the y values for respective x values between minx and maxx, ie in the following example:
ave(file, 2, 3) #where file is wherever I import these x and y values from
it would return 3.3857...
I have tried the following:
def ave(pd,minx,maxx):
x = list(data.iloc[:, 0].values)
y = list(data.iloc[:, 1].values)
lst=[]
for i in x:
if x[i]>xmin and x[i]<xmax:
lst+=y[i]
return (sum(lst)/len(list))
but this gives the error: list indices must be integers or slices, not numpy.float64
Why not just select rows where those conditions are true? You really should avoid looping as much as possible when working with dataframes.
def y_average(df, min_x, max_x):
return df[(df["x"] > min_x) & (df["x"] < max_x)]["y"].mean()
Usage:
In [3]: avg(df, 2, 3)
Out[3]: 3.3857142857142857