I have one dataframe as shown below.
df1:
id Name
1 1*1+0*1
2 1*0+0*0
3 0*0+1+1
The desired out put should be
df2:
1 1
2 0
3 2
How to achieve this using pysaprk dataframe.
You can use Python's eval function to evalutate the expression. Following is an example.
import pyspark.sql.functions as F
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("InterpolateNulls").getOrCreate()
data = [
(1, '1*1+0*1'),
(2, '1*0+0*0'),
(3, '0*0+1+1')
]
schema = ["id", "Name"]
df1 = spark.createDataFrame(data=data, schema=schema)
def evaluate_exp(given_exp):
return eval(given_exp)
evaluate_exp_udf = F.udf(evaluate_exp)
df1.show(n=100, truncate=False)
df_result = df1.withColumn("result_python", evaluate_exp_udf(F.col("Name")))
print("Result using PYTHON UDF")
df_result.show(n=100, truncate=False)
Output :
+---+-------+
|id |Name |
+---+-------+
|1 |1*1+0*1|
|2 |1*0+0*0|
|3 |0*0+1+1|
+---+-------+
Result using PYTHON UDF
+---+-------+-------------+
|id |Name |result_python|
+---+-------+-------------+
|1 |1*1+0*1|1 |
|2 |1*0+0*0|0 |
|3 |0*0+1+1|2 |
+---+-------+-------------+