Search code examples
rapache-sparksparklyr

Convert variable as Timestamp in sparklyr


I know similar question has been asked multiple times before but I have tried all those options and still not get desired result.

I have a sdf as kl in following format:

   CONSUMER_ID   TimeStamp        TimeStamp2        
          <dbl>    <dbl>                <chr>              
 1       958    20171201000002 20171201000002        
 2       985    20171201000005 20171201000005  

I want to convert Timestamp in format "yyyy/MM/dd H:M:S"

I have tried multiple options which I found online but nothing works for this format.

Ex:

library(sparklyr)             
library(dplyr)

ms_rech_10 = kl %>% mutate(time_2 = date(timestamp(unix_timestamp(TimeStamp2))))

ms_rech_12 = kl %>% mutate(time_2 = date(TimeStamp2))

ms_rech_12 = kl %>% mutate(time_2 = to_date(TimeStamp2))

But in each code I got following output:

  CONSUMER_ID      TimeStamp     TimeStamp2     time_2    
          <dbl>          <dbl>       <chr>       <date>    
 1        958     20171201000002 20171201000002   NA        
 2        985     20171201000005 20171201000005   NA 

Please provide the solution in following format.

    CONSUMER_ID      TimeStamp     TimeStamp2     time_2    
          <dbl>          <dbl>       <chr>       <dttm>    
 1        958     20171201000002 20171201000002   2017/12/01 00:00:02        
 2        985     20171201000005 20171201000005   2017/12/01 00:00:05 

Thanks!!


Solution

  • If you need formatted string (you cannot have formatted timestamp):

    df <- copy_to(sc, tibble(Timestamp2=c("20171201000002", "20171201000005")))
    
    df %>%  
      mutate(time2 = from_unixtime(
        unix_timestamp(Timestamp2, "yyyyMMddHHmmss"), "yyyy/MM/dd HH:mm:ss"))
    
    # Source:   lazy query [?? x 2]
    # Database: spark_connection
      Timestamp2     time2              
      <chr>          <chr>              
    1 20171201000002 2017/12/01 00:00:02
    2 20171201000005 2017/12/01 00:00:05
    

    otherwise just use to_timestamp(Timestamp2, "yyyyMMddHHmmss")