I try to create a new variable called k
which its values depends if metric
is I or M, otherwise I want to return an empty value.
Thanks in advance for your answer :)
data = [["1", "Amit", "DU", "I", "8", "6"],
["2", "Mohit", "DU", "I", "4", "2"],
["3", "rohith", "BHU", "I", "5", "3"],
["4", "sridevi", "LPU", "I", "1", "6"],
["1", "sravan", "KLMP", "M", "2", "4"],
["5", "gnanesh", "IIT", "M", "6", "8"],
["6", "gnadesh", "KLM", "c", "10", "9"]]
columns = ['ID', 'NAME', 'college', 'metric', 'x', 'y']
dataframe = spark.createDataFrame(data, columns)
+---+-------+-------+------+---+---+
| ID| NAME|college|metric| x| y|
+---+-------+-------+------+---+---+
| 1| Amit| DU| I| 8| 6|
| 2| Mohit| DU| I| 4| 2|
| 3| rohith| BHU| I| 5| 3|
| 4|sridevi| LPU| I| 1| 6|
| 1| sravan| KLMP| M| 2| 4|
| 5|gnanesh| IIT| M| 6| 8|
| 6|gnadesh| KLM| c| 10| 9|
+---+-------+-------+------+---+---+
I tried to use this but it does not work
dataframe= dataframe.withColumn('k', when ((col('metric') == 'M',(dataframe['metric'] / 10)))
.when ((col('metric') == 'I',(dataframe['metric'] / 10 * 2,54)))
.otherwise (' '))
from pyspark.sql.functions import lit
dataframe= dataframe.withColumn('k', when ((col('metric') == 'M',(dataframe['metric'] / 10)))
.when ((col('metric') == 'I',(dataframe['metric'] / 10 * 2,54)))
.otherwise (lit(' ')))
Or
from pyspark.sql.functions import lit
dataframe= dataframe.withColumn('k', when ((col('metric') == 'M',(dataframe['metric'] / 10)))
.when ((col('metric') == 'I',(dataframe['metric'] / 10 * 2,54)))
.otherwise (lit(None)))
I am guessing you're getting the error in the otherwise
part of the code. The argument for DataFrame.withColumn
should be of type Column
, which ' '
isn't.