I have a table with 372 columns and it contains many columns having "long" datatype. I want to cast them to "int" datatype.
I found some solution from another similar question asked here, but it isn't working from me.
from pyspark.sql.functions import col
schema = {col: col_type for col, col_type in df.dtypes}
time_cols = [col for col, col_type in schema.items() if col_type in "timestamp date".split() or "date" in col or "time" in col]
for column in time_cols:
df = df.withColumn(column, col(column).cast("to_timestamp"))
It's a good practice to use .select
when possible instead of many .withColumn
df = df.select(
[col(c).cast('int') if t == 'bigint' else c for c, t in df.dtypes]
)