Search code examples
pysparkdatabricks

Integer to date conversion within Databricks


I have date in the String format 20240107193154 within databricks dataframe, I want to convert this into YYY-MM-DD:HH:MM:SS datetime format within Pyspark .

I have tried using unixtimestmp but no luck.


Solution

  • We'll need 2 functions to achieve your goal:

    Here's an example of how we might achieve this:

    import pyspark.sql.functions as F
    from pyspark.sql.types import *
    
    # example data
    data = [
      ['a', '20240104193154'],
      ['b', '20240105193154'],
      ['c', '20240106193154'],
      ['d', '20240107193154'],
      ['e', '20240108193154'],
      ['f', '20240109193154']]
    
    schema = StructType([
      StructField("Event", StringType(), True),
      StructField("StringTimeStamp", StringType(), True)
    ])
    
    # create dataframe
    df = spark.createDataFrame(data, schema)
    
    # F.to_timestamp converts the string format to a timestamp
    # F.date_format converts the timestamp to desired string format
    (
      df
      .withColumn('Timestamp',F.to_timestamp(F.col('StringTimeStamp'),'yyyyMMddHHmmss'))
      .withColumn('FormattedString',F.date_format(F.col('Timestamp'),'yyyy-MM-dd:hh:mm:ss'))
    ).show()
    

    We can see the results:

    +-----+---------------+-------------------+-------------------+
    |Event|StringTimeStamp|          Timestamp|    FormattedString|
    +-----+---------------+-------------------+-------------------+
    |    a| 20240104193154|2024-01-04 19:31:54|2024-01-04:07:31:54|
    |    b| 20240105193154|2024-01-05 19:31:54|2024-01-05:07:31:54|
    |    c| 20240106193154|2024-01-06 19:31:54|2024-01-06:07:31:54|
    |    d| 20240107193154|2024-01-07 19:31:54|2024-01-07:07:31:54|
    |    e| 20240108193154|2024-01-08 19:31:54|2024-01-08:07:31:54|
    |    f| 20240109193154|2024-01-09 19:31:54|2024-01-09:07:31:54|
    +-----+---------------+-------------------+-------------------+