Summarize low values into one

I have a table that looks something like this:

name	value
optionA	123
optionB	50
optionC	32
optionD	4
optionE	2
optionF	1

I don't need the specifics about values lower that 10. So I just want to merge those values into one with the name "other" and the sum of those low values. Result would look like this:

name	value
optionA	123
optionB	50
optionC	32
other	7

What's the easiest way to do that in databricks using pyspark?

Solution

Let us do

name = F.expr("IF (value < 10, 'other', name)")
df.withColumn('name', name).groupby('name').agg(F.sum('value').alias('value'))

+-------+-----+
|   name|value|
+-------+-----+
|optionA|  123|
|optionB|   50|
|optionC|   32|
|  other|    7|
+-------+-----+

Py4JJava Error on Azure Databricks notebook
Pyspark Regular Expression add double quotes after comma
How to properly checkpoint a dataframe in PySpark
How to construct Dataframe from a Excel (xls,xlsx) file in Scala Spark?
Is there any preference on the order of select and filter in spark?
Spark: What is the difference between repartition and repartitionByRange?
which is the best way to convert json into a dataframe?
Pyspark : How to get all last months to the current month?
Check if the file from blob storage is in format of MMDDYYYY
CONTEXT_ONLY_VALID_ON_DRIVER : how to access/pass the spark context pandas_udf in another python file
PySpark to_timestamp timezone conversion
Spark Shell: spark.executor.extraJavaOptions is not allowed to set Spark options
XPath Query Returns Lists Omitting Missing Values Instead of Including None
Pyspark on GCP Dataproc - Partial reading of data for gzip encoded Cloud Storage files
How to reinstall same version of a wheel on Databricks without cluster restart
Pyenv - Switching between Python and PySpark versions without hardcoding environment variable paths for python
How to handle accented letter in Pyspark
Pyspark Streaming data to Elastic search index from Kafka topic , running in Jupyter notebook, causing failure
How to handle an AnalysisException on Spark SQL?
Share cluster params between jobs
How convert a list into multiple columns and a dataframe?
PySpark Window functions: Aggregation differs if WindowSpec has sorting
Add quote for pyspark dataframe column with regular expressions
Using rangeBetween considering months rather than days in PySpark
Pyspark replace strings in Spark dataframe column
How to specify file size using repartition() in spark
Spark reading from mutiple SQL databases in parallel
Spark partition size greater than the executor memory
Last day of quarter
polars groupby and pivot converting code from pyspark