Search code examples
pythondataframepysparkapache-spark-sql

How to create a new column in dataframe whose value is derived from other columns of the dataframe


I have a dataframe which has columns a,b. I want to create, in same dataframe, another column whose value (for each row) should be a*b. How do I do that?

I tried few examples but none are working

short_df['Revenue'] = short_df.(lambda row: (row['UnitPrice']*row['Quantity']))
display(short_df.limit(10))

Solution

  • Unless I'm missing something, we have a trivial solution

    import pyspark.sql.functions as F
    
    short_df = short_df.withColumn('Revenue', F.col('UnitPrice') * F.col('Quantity'))