Search code examples
apache-sparkpysparktimetimestampcomparison

Check if timestamp is inside range


I'm trying to obtain the following:

+--------------------+
|work_time | day_shift|
+--------------------+
| 00:45:40 |      No  |
| 10:05:47 |      Yes |
| 15:25:28 |      Yes |
| 19:38:52 |      No  |

where I classify the "work_time" into "day_shift".
"Yes" - if the time falls between 09:00:00 and 18:00:00
"No" - otherwise

My "work_time" is in datetime format showing only the time. I tried the following, but I'm just getting "No" for everything.

df = df.withColumn('day_shift', when(df.work_time >= to_timestamp(lit('09:00:00'), 'HH:mm:ss') & df.work_time <= to_timestamp(lit('18:00:00'), 'Yes').otherwise('No'))

Solution

  • You can use Column class method between. It works for both, timestamps and strings in format "HH:mm:ss". Use this:

    F.col("work_time").between("09:00:00", "18:00:00")
    

    Full test:

    from pyspark.sql import functions as F
    df = spark.createDataFrame([('00:45:40',), ('10:05:47',), ('15:25:28',), ('19:38:52',)], ['work_time'])
    
    day_shift = F.col("work_time").between("09:00:00", "18:00:00")
    df = df.withColumn("day_shift", F.when(day_shift, "Yes").otherwise("No"))
    
    df.show()
    # +---------+---------+
    # |work_time|day_shift|
    # +---------+---------+
    # | 00:45:40|       No|
    # | 10:05:47|      Yes|
    # | 15:25:28|      Yes|
    # | 19:38:52|       No|
    # +---------+---------+