Search code examples
pythonsqldataframepyspark

Pyspark: Select all columns except particular columns


I have a large number of columns in a PySpark dataframe, say 200. I want to select all the columns except say 3-4 of the columns. How do I select this columns without having to manually type the names of all the columns I want to select?


Solution

  • In the end, I settled for the following :

    • Drop:

      df.drop('column_1', 'column_2', 'column_3')

    • Select :

      df.select([c for c in df.columns if c not in {'column_1', 'column_2', 'column_3'}])