I have a table that looks something like this:
name | value |
---|---|
optionA | 123 |
optionB | 50 |
optionC | 32 |
optionD | 4 |
optionE | 2 |
optionF | 1 |
I don't need the specifics about values lower that 10. So I just want to merge those values into one with the name "other" and the sum of those low values. Result would look like this:
name | value |
---|---|
optionA | 123 |
optionB | 50 |
optionC | 32 |
other | 7 |
What's the easiest way to do that in databricks using pyspark?
Let us do
name = F.expr("IF (value < 10, 'other', name)")
df.withColumn('name', name).groupby('name').agg(F.sum('value').alias('value'))
+-------+-----+
| name|value|
+-------+-----+
|optionA| 123|
|optionB| 50|
|optionC| 32|
| other| 7|
+-------+-----+