I'm trying to obtain the following:
+--------------------+
|work_time | day_shift|
+--------------------+
| 00:45:40 | No |
| 10:05:47 | Yes |
| 15:25:28 | Yes |
| 19:38:52 | No |
where I classify the "work_time" into "day_shift".
"Yes" - if the time falls between 09:00:00 and 18:00:00
"No" - otherwise
My "work_time" is in datetime format showing only the time. I tried the following, but I'm just getting "No" for everything.
df = df.withColumn('day_shift', when(df.work_time >= to_timestamp(lit('09:00:00'), 'HH:mm:ss') & df.work_time <= to_timestamp(lit('18:00:00'), 'Yes').otherwise('No'))
You can use Column
class method between
. It works for both, timestamps and strings in format "HH:mm:ss"
. Use this:
F.col("work_time").between("09:00:00", "18:00:00")
Full test:
from pyspark.sql import functions as F
df = spark.createDataFrame([('00:45:40',), ('10:05:47',), ('15:25:28',), ('19:38:52',)], ['work_time'])
day_shift = F.col("work_time").between("09:00:00", "18:00:00")
df = df.withColumn("day_shift", F.when(day_shift, "Yes").otherwise("No"))
df.show()
# +---------+---------+
# |work_time|day_shift|
# +---------+---------+
# | 00:45:40| No|
# | 10:05:47| Yes|
# | 15:25:28| Yes|
# | 19:38:52| No|
# +---------+---------+