I have a "Date" column that is a String in a Spark DF in this format 1/1/2000 12:53 AM, 1/1/2000 2:53 AM, 1/1/2000 5:53 AM, ... I am trying to create a new column that converts this column into a Unix Timestamp but getting a column full of null as my output. The line I am using to create this column is:
val New_DF = Old_DF.withColumn("Timestamp", unix_timestamp($"Date", "MM/dd/yyyy hh:mm:ss a"))
I created the Date column by concatenating separate Month, Day, Year, and Time columns but the Month and Day columns have input data in the form of 1 instead of 01 for Month and Day. Is this why I'm getting a null column back or is there another reason? And if this is the reason then how do I fix the day and month columns from 1 to 01, 2 to 02,...?
This is my first time working with timestamps and I am new to Scala so I greatly appreciate the help.
You can specify one letter M
, d
and h
only. Spark will use that as a minimum number of digits that the field contains. Note that your timestamp strings do not have seconds, so you should not include :ss
.
val New_DF = Old_DF.withColumn("Timestamp", unix_timestamp($"Date", "M/d/yyyy h:mm a"))
See https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html for more details of datetime formatting. In particular:
Number: For formatting, the number of pattern letters is the minimum number of digits, and shorter numbers are zero-padded to this amount. For parsing, the number of pattern letters is ignored unless it's needed to separate two adjacent fields.