I have a very wide dataframe in spark .It has 80 columns so I want to set a column to 0 and rest to 1.
So the one I want to set to 1 I tried it with
df = df.withColumn("set_zero_column", lit(0))
and it worked.
Now I want to set the rest columns to 1. How do I do without specify all the 79 names ?
Any help is appreciated
Use select
with a list comprehension:
from pyspark.sql.functions import lit
set_one_columns = [lit(1).alias(c) for c in df.columns if c != "set_zero_column"]
df = df.select(lit(0).alias("set_zero_column"), *set_one_columns)
If you needed to maintain the original column order, you could do:
cols = [lit(0).alias(c) if c == "set_zero_column" else lit(1).alias(c) for c in df.columns]
df = df.select(*cols)