Search code examples
pythondataframepysparkuser-defined-functionspreprocessor

How to I apply my single_space function on a large number of dataframe columns?


I am using a function to turn all whitespaces in a pyspark dataframe into single whitespaces. I am able to apply this function individually to seperate columns using .withcolumn. Now, I have around 120 columns of mixed types and I would like to apply this function only to the string columns. For that, I created a list containing only the string typed column names. How do I feed (apply, map ?) this array to my function using withcolumn?

import quinn

#example data
data = { 
    'fruits': ["apples", "    banana", "cherry"],
    'veggies': [1, 0, 1],
    'meat': ["pig", "cow", "   chicken  "]}

df = pd.DataFrame(data)
ddf = spark.createDataFrame(df)

mylist_column= [item[0] for item in df.dtypes if item[1].startswith('string')]
df= df.withColumn('fruits', quinn.single_space('fruits'))

Solution

  • for element in mylist_column:
       ddf= ddf.withColumn(element, quinn.single_space(element))