I am trying to learn Spark and I am reading a dataframe with a timestamp column using the unix_timestamp
function as below:
val columnName = "TIMESTAMPCOL"
val sequence = Seq(2016-01-20 12:05:06.999)
val dataframe = {
sequence.toDF(columnName)
}
val typeDataframe = dataframe.withColumn(columnName, org.apache.spark.sql.functions.unix_timestamp($"TIMESTAMPCOL"))
typeDataframe.show
This produces an output:
+------------+
|TIMESTAMPCOL|
+------------+
| 1453320306|
+------------+
How can I read it so that I don't lose the ms i.e the .999
part? I tried using unix_timestamp(col: Col, s: String)
where s is the SimpleDateFormat, eg "yyyy-MM-dd hh:mm:ss", without any luck.
To retain the milliseconds use "yyyy-MM-dd HH:mm:ss.SSS"
format. You can use date_format
like below.
val typeDataframe = dataframe.withColumn(columnName, org.apache.spark.sql.functions.date_format($"TIMESTAMPCOL","yyyy-MM-dd HH:mm:ss.SSS"))
typeDataframe.show
This will give you
+-----------------------+
|TIMESTAMPCOL |
+-----------------------+
|2016-01-20 12:05:06:999|
+-----------------------+