I have a spark dataframe with multiple columns, one of which I want to parse out dates from as a separate column. For the following two rows, the expected output would be the following:
'www.freelancer/hello/there/I/am/2024/01/03/every/woijf123oijroa.fiow.com'
'www.freelancer/camping/fun/2024/02/14/foijaoijf83747199.1.com'
Expected date output:
2024/01/03
2024/02/14
df.withColumn('date', split(col('website'), '/')[5])
doesn't work because the forward slashes don't follow a set pattern and even if they did, the output results in whatever is between two brackets rather than across multiple brackets.
Tried using locate()
to find the index at which the dates start and to pull 10 values from that index, but it didn't function appropriately.