Search code examples
pythonpandasdataframedata-analysisexploratory-data-analysis

loop over columns in dataframes python


I want to loop over 2 columns in a specific dataframe and I want to access the data by the name of the column but it gives me this error (type error) on line 3

i=0
for name,value in df.iteritems():
 
  q1=df[name].quantile(0.25)
  q3=df[name].quantile(0.75)
  IQR=q3-q1
  min=q1-1.5*IQR
  max=q3+1.5*IQR
  minout=df[df[name]<min]
  maxout=df[df[name]>max]
  new_df=df[(df[name]<max) & (df[name]>min)]
  i+=1
  if i==2:
    break

Solution

  • It looks like you want to exclude outliers based on the 1.5*IQR rule. Here is a simpler solution:

    Input dummy data:

    import numpy as np
    np.random.seed(0)
    df = pd.DataFrame({'col%s' % (i+1): np.random.normal(size=1000)
                       for i in range(4)})
    

    input data

    Removing the outliers (keep data: Q1-1.5IQR < data < Q3+1.5IQR):

    Q1 = df.iloc[:, :2].quantile(.25)
    Q3 = df.iloc[:, :2].quantile(.75)
    IQR = Q3-Q1
    
    non_outliers = (df.iloc[:, :2] > Q1-1.5*IQR) & (df.iloc[:, :2] < Q3+1.5*IQR)
    
    new_df = df[non_outliers.all(axis=1)]
    

    output: output