How to sum row wise data using single column in pysaprk

I have one dataframe as shown below.

df1:

id Name
1  1*1+0*1
2  1*0+0*0
3  0*0+1+1

The desired out put should be

df2:

1  1
2  0
3  2

How to achieve this using pysaprk dataframe.

Solution

You can use Python's eval function to evalutate the expression. Following is an example.

import pyspark.sql.functions as F
from pyspark.sql import SparkSession


spark = SparkSession.builder.appName("InterpolateNulls").getOrCreate()


data = [
    (1, '1*1+0*1'),
    (2, '1*0+0*0'),
    (3, '0*0+1+1')
]
schema = ["id", "Name"]
df1 = spark.createDataFrame(data=data, schema=schema)


def evaluate_exp(given_exp):
    return eval(given_exp)


evaluate_exp_udf = F.udf(evaluate_exp)

df1.show(n=100, truncate=False)

df_result = df1.withColumn("result_python", evaluate_exp_udf(F.col("Name")))
print("Result using PYTHON UDF")
df_result.show(n=100, truncate=False)

Output :

+---+-------+
|id |Name   |
+---+-------+
|1  |1*1+0*1|
|2  |1*0+0*0|
|3  |0*0+1+1|
+---+-------+

Result using PYTHON UDF
+---+-------+-------------+
|id |Name   |result_python|
+---+-------+-------------+
|1  |1*1+0*1|1            |
|2  |1*0+0*0|0            |
|3  |0*0+1+1|2            |
+---+-------+-------------+