I'm trying to perform round function on df.summary()
dataframe, excluding the summary column. So far I've tried using using select()
and a comprehension list e.g.
df2 = df.select(*[round(column, 2).alias(column) for column in df.columns])
This is the output of df2
the categorical values get converted into NULL
.
+---------+-------+-------+-------+-------+
| Summary | col 1 | col 2 | col 3 | col 4 |
+---------+-------+-------+-------+-------+
| NULL | 0 | 0.1 | 0.2 | 0.3 |
+---------+-------+-------+-------+-------+
| NULL | 1 | 1.1 | 1.2 | 1.3 |
+---------+-------+-------+-------+-------+
| NULL | 2 | 2.1 | 2.2 | 2.3 |
+---------+-------+-------+-------+-------+
I want only columns[1:]
to be rounded.
+---------+-------+-------+-------+-------+
| Summary | col 1 | col 2 | col 3 | col 4 |
+---------+-------+-------+-------+-------+
| min | 0 | 0.1 | 0.2 | 0.3 |
+---------+-------+-------+-------+-------+
| max | 1 | 1.1 | 1.2 | 1.3 |
+---------+-------+-------+-------+-------+
| stddev | 2 | 2.1 | 2.2 | 2.3 |
+---------+-------+-------+-------+-------+
I've also tried slicing df.columns[1:]
, but then it doesn't select the summary column.
df2 = df.select(*[round(column, 2).alias(column) for column in df.columns[1:])
+-------+-------+-------+-------+
| col 4 | col 1 | col 2 | col 3 |
+-------+-------+-------+-------+
| 0.3 | 0 | 0.1 | 0.2 |
+-------+-------+-------+-------+
| 1.3 | 1 | 1.1 | 1.2 |
+-------+-------+-------+-------+
| 2.3 | 2 | 2.1 | 2.2 |
+-------+-------+-------+-------+
If you want to exclude the first column from the rounding operation, you can modify your code to selectively apply the rounding operation only to the desired columns. You may try the following:
columns_to_round = df.columns[1:]
rounded_df = df.selectExpr("Summary", *[f"round({column}, 2) as {column}" for column in columns_to_round])