Search code examples
pythonpyspark

creating timestamp column using pyspark


I'd love to create a new timestamp column on a dataframe using a date column and a string column

Date Times (Sting) desired column
2020-11-03 15:34:02 2020-11-03 15:34:02

i'm trying something like that in the select statement but i'm having an error. Can anyone help?

F.to_timestamp(F.concat_ws('', F.col("Date"), F.col("Time"), 'yyyy-MM-dd HH:mm:ss')).alias("desired_column")

Solution

  • You can simply do something like this by using pyspark functions:

    import pyspark
    from pyspark.sql import functions as sf
    
    sc = pyspark.SparkContext()
    sqlc = pyspark.SQLContext(sc)
    
    # note this i used to create the data frame
    df = sqlc.createDataFrame([('2020-11-03','15:34:02')], ['Date', 'Times (Sting)'])
    
    print(df.show())
    
    df = df.withColumn('desired column',sf.concat(sf.col('Date'),sf.lit(' '), sf.col('Times (Sting)')))
    
    print(df.show())
    

    Output: enter image description here