Search code examples
pythonpandasdataframedatasetdata-science

How to add a dataframe to another on python


So I have 3 columns. ETA (eta/km*100) (a number), Climate, and month.

My purpose is to drop the values higher that 0.95 quartile and lower than 0.05 (the extreme cases on this dataset) for each subset of 3 months and Climate, and the reagroup the dataset on a single dataset.

The issue I'm having here is that even tho inside the "for" statement it does the job, when I print the resulting data frame, it only has the last subset (Hurricane, last 3 months) without dropping the extreme data.

I've tried concat, add and append. Not sure what I'm doing wrong here.

Climate = ['Sunny', 'Cloudy', 'Foggy', 'Rain', 'Storm', 'Hurricane']
newDf = df_Cl
newDf.iloc[0:0]


for cl in Climate:
    print (cl)
    for num in range(4, 14, 3):
        print (num)
        df_Temp = df.loc[(df['Climate'] == cl) & (df['month'] < num)]
        bajo = df_Temp['eta/km*100'].quantile(0.05)
        alto = df_Temp['eta/km*100'].quantile(0.95)
        df_Temp = df_Temp[df_Temp['eta/km*100'] > bajo]
        df_Temp = df_Temp[df_Temp['eta/km*100'] < alto]
        newDf.add(df_Temp)

I've also tried:

newDf += df_Temp

But all the values become "NaN"


Solution

  • Use:

    Climate = ['Sunny', 'Cloudy', 'Foggy', 'Rain', 'Storm', 'Hurricane']
    
    #filter only rows by Climate list
    df1 = df[df['Climate'].isin(Climate)]
    
    #create groups per Climate and each 3 months
    g = df1.groupby(['Climate', df['month'].sub(1).floordiv(3)])['eta/km*100']
    
    #filter between 0.05 and 0.95 quantile
    out = df1[df1['eta/km*100'].between(g.quantile(0.05),
                                        g.quantile.quantile(0.95), inclusive='neither')]
    

    Your solution working if append final df_Temp to lsit of DataFrames and last use concat for join together:

    L = []
    for cl in Climate:
        print (cl)
        for num in range(4, 14, 3):
            print (num)
            df_Temp = df.loc[(df['Climate'] == cl) & (df['month'] < num)]
            bajo = df_Temp['eta/km*100'].quantile(0.05)
            alto = df_Temp['eta/km*100'].quantile(0.95)
            df_Temp = df_Temp[df_Temp['eta/km*100'] > bajo]
            df_Temp = df_Temp[df_Temp['eta/km*100'] < alto]
            L.append(df_Temp)
            
    out = pd.concat(L)