I am trying to change the type of column from string to Datetime using the code below (in Databricks notebook).
import org.apache.spark.sql.functions._
val df\ = df.withColumn("end",col("end").cast(DateType))
df\.printSchema()
Or like that:
df.createOrReplaceTempView("CastExample")
val df4 = spark.sql("DATE(end) from CastExample")
df4.printSchema()
df4.show(false)
But I get this error:
SyntaxError: invalid syntax
File "<command-1642181972810133>", line 2
val df4 = spark.sql("DATE(end) from CastExample")
^
SyntaxError: invalid syntax
"val"
It seems like it means 'immutable reference' or something, but I can not find any information about it online. There are many examples of code using it, but no one mentions why it is there. Or I am searching it in the wrong way. It seems like it from Scala, but I don't know... Maybe I did not import something.
I would appreciate any advice on it.
You should not use 'val' as thats the Scala syntax, also if you want all columns of df in df4, use *.
df.createOrReplaceTempView("CastExample")
df4 = spark.sql("SELECT *, DATE(end) as new_name from CastExample")
df4.printSchema()
df4.show(10,False)
You can use PySpark to achieve the same too.
df4=df.select(to_date(df.end).alias('new_name'))
df4.show(10,False)