So I have 3 columns. ETA (eta/km*100) (a number), Climate, and month.
My purpose is to drop the values higher that 0.95 quartile and lower than 0.05 (the extreme cases on this dataset) for each subset of 3 months and Climate, and the reagroup the dataset on a single dataset.
The issue I'm having here is that even tho inside the "for" statement it does the job, when I print the resulting data frame, it only has the last subset (Hurricane, last 3 months) without dropping the extreme data.
I've tried concat, add and append. Not sure what I'm doing wrong here.
Climate = ['Sunny', 'Cloudy', 'Foggy', 'Rain', 'Storm', 'Hurricane']
newDf = df_Cl
newDf.iloc[0:0]
for cl in Climate:
print (cl)
for num in range(4, 14, 3):
print (num)
df_Temp = df.loc[(df['Climate'] == cl) & (df['month'] < num)]
bajo = df_Temp['eta/km*100'].quantile(0.05)
alto = df_Temp['eta/km*100'].quantile(0.95)
df_Temp = df_Temp[df_Temp['eta/km*100'] > bajo]
df_Temp = df_Temp[df_Temp['eta/km*100'] < alto]
newDf.add(df_Temp)
I've also tried:
newDf += df_Temp
But all the values become "NaN"
Use:
Climate = ['Sunny', 'Cloudy', 'Foggy', 'Rain', 'Storm', 'Hurricane']
#filter only rows by Climate list
df1 = df[df['Climate'].isin(Climate)]
#create groups per Climate and each 3 months
g = df1.groupby(['Climate', df['month'].sub(1).floordiv(3)])['eta/km*100']
#filter between 0.05 and 0.95 quantile
out = df1[df1['eta/km*100'].between(g.quantile(0.05),
g.quantile.quantile(0.95), inclusive='neither')]
Your solution working if append final df_Temp
to lsit of DataFrames and last use concat
for join together:
L = []
for cl in Climate:
print (cl)
for num in range(4, 14, 3):
print (num)
df_Temp = df.loc[(df['Climate'] == cl) & (df['month'] < num)]
bajo = df_Temp['eta/km*100'].quantile(0.05)
alto = df_Temp['eta/km*100'].quantile(0.95)
df_Temp = df_Temp[df_Temp['eta/km*100'] > bajo]
df_Temp = df_Temp[df_Temp['eta/km*100'] < alto]
L.append(df_Temp)
out = pd.concat(L)