Search code examples
scalaapache-sparkintellij-ideaapache-spark-sqlparquet

How to select 13 digit timestamp column from parquet file,convert it to date and store as a data frame?


Since I am newbie to Apache spark and Scala methods, I want to perform the following needs.

-Read specific column from parquet file(13 Digit timestamp).

-Convert the timestamp to ordinary date format(yyyy-MM-dd HH:mm:ss).

-Store the result as another column in dataset.

I can read the timestamp using the following code

import org.apache.spark.SparkConf

import org.apache.spark.SparkContext

import org.apache.spark.sql.SQLContext
object Test {

  def main(args: Array[String]){
    val conf=new SparkConf().setAppName("TEST_APP").setMaster("local")
    val sc=new SparkContext(conf)
    val sqlcon=new SQLContext(sc)
    val Testdata = sqlcon.read.parquet("D:\\TestData.parquet")
    val data_eve_type_end=Testdata.select(Testdata.col("heading.timestamp")).where(Testdata.col("status").equalTo("Success")).toDF("13digitTime")
  }
}

and I tried to convert the timestamp using the reference link below

[https://stackoverflow.com/a/54354790/9493078]

But it doesn't working for me.I don't know actually whether I am fetched the data into a dataframe correctly or not.Anyway that makes an output as a table with columnname 13digitTime and values as some numbers with size 13 digit.

When I am trying to do code from link mentioned above it shows the error as

WARN Utils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.debug.maxToStringFields' in SparkEnv.conf.
Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve '(`13digitTime` / 1000000)' due to data type mismatch:

I am expecting for data frame with 2 columns in which one should contain the 13 digit timestamp and other should contain converted time from 13 digit to general date format(yyyy-MM-dd HH:mm:ss).

I wish to kindly get a solution,Thanks in advance.


Solution

  • sqlcon.read.parquet will return a dataframe itself. All you need to do is add a new column using withcolumn method. This should work.

    val data_eve_type_end = Testdata.withColumn("13digitTime", from_unixtime($"heading.timestamp"))
    

    I updated my code like this in which the 13 digit unix time converted into 10 digit by dividing by 1000 and cast it to tiimestamp.

    val date_conv=data_eve_type_end.select(col("timestamp_value").as("UNIX TIME"),from_unixtime(col("timestamp_value")/1000).cast("timestamp").as("GENERAL TIME"))
    

    and output is like

    +-------------+-------------------+
    |    UNIX TIME|       GENERAL TIME|
    +-------------+-------------------+
    |1551552902793|  2019-03-0 6:55:02|