Search code examples
pyspark

Pyspark - Loop over dataframe columns by list


New to pyspark. Just trying to simply loop over columns that exist in a variable list. This is what I've tried, but doesn't work.

column_list = ['colA','colB','colC']
for col in df:
   if col in column_list:
      df = df.withColumn(...)
   else:
      pass

It's definitely an issue with the loop. I feel like I'm missing something really simple here. I performed the df operation independently on each column and it ran clean ie.

df = df.withColumn(...'colA').withColumn(...'colB').withColumn(...'colC')

Solution

  • Use the following snippet

    column_list = ['colA','colB','colC']
    for col in df.columns:
       if col in column_list:
          df = df.withColumn(....)
       else:
          pass