Search code examples
apache-sparkpysparkdouble

How to convert all int dtypes to double simultanously on PySpark


here's my dataset

DataFrame[column1: double, column2: double, column3: int, column4: int, column5: int, ... , column300: int]

What I want is

DataFrame[column1: double, column2: double, column3: double, column4: double, column5: double, ... , column300: double]

What I did dataset.withColumn("column3", datalabel.column3.cast(DoubleType()))

It is too manual, can you show me how to do that?


Solution

  • You can use list comprehensions to construct the converted field list.

    import pyspark.sql.functions as F
    ...
    cols = [F.col(field[0]).cast('double') if field[1] == 'int' else F.col(field[0]) for field in df.dtypes]
    df = df.select(cols)
    df.printSchema()