i have 2 statements which are to my knowledge exactly alike, but select()
works fine, but selectExpr()
generates following results.
+-----------------------+----------------------+
|first(StockCode, false)|last(StockCode, false)|
+-----------------------+----------------------+
| 85123A| 22138|
+-----------------------+----------------------+
+-----------+----------+
|first_value|last_value|
+-----------+----------+
| StockCode| StockCode|
+-----------+----------+
following is implementation.
df.select(first(col("StockCode")), last(col("StockCode"))).show()
df.selectExpr("""first('StockCode') as first_value""", """last('StockCode') as last_value""").show()
Can any 1 explain the behaviour?
selectExpr
takes everything as select
clause in sql.
Hence if you write anything in single quote'
, it will act as string in sql. if you wanted to pass the column to selectExpr
use backtique (`) as below-
df.selectExpr("""first(`StockCode`) as first_value""", """last(`StockCode`) as last_value""").show()
backtique will help you to escape space in the column.
you can use without backtique also if your column name is not starting with number like 12col
or it doesn't have spaces in between like column name
df.selectExpr("""first(StockCode) as first_value""", """last(StockCode) as last_value""").show()