I have the following table which I am trying to wrangle in GCP Data prep:
Timestamp Event
2018-04-01 0
2018-04-02 0
2018-04-03 0
2018-04-04 0
2018-04-05 1
2018-04-06 0
2018-04-07 0
2018-04-08 0
I am trying to transform it in a way such that if Event is 1, then the previous 3 entries in the Event are set to 1 and the next 2 entries in Event are set to 2.
So, essentially the data set will look like the below after transformation
Timestamp Event
2018-04-01 0
2018-04-02 1
2018-04-03 1
2018-04-04 1
2018-04-05 1
2018-04-06 2
2018-04-07 2
2018-04-08 0
I have tried to use window and conditionals to achieve this, but w/o success. Any ideas on how this transformation can be achieved? I am open to splitting the column or creating a new derived column if that can help achieve this result. Thanks!
You can use window functions as part of your conditions in your IF statements. Using the PREV and NEXT window functions you can get the values at X rows above or below the current row in your window. Once you got the values, you can compare if they match the expected value and shape your IF statement accordingly.
For your use case, you need to verify if the PREV value at 1 or 2 position prior is equal to one and replace these rows by the number 2. If not true, if the NEXT value at position 1, 2 or 3 is equal to 1, the rows should be replaced with the number 1. Lastly, you need to check if the value at the current row is 1 and replace the remaining rows with 0. Converting this into a formula accepted by Dataprep would look like the following:
IF(PREV(Event, 1) == 1 || PREV(Event, 2) == 1, 2, IF(NEXT(Event, 1) == 1 || NEXT(Event, 2) == 1 || NEXT(Event, 3) == 1, 1, IF(Event == 1, 1, 0)))
To enter this formula on Dataprep, under the Function tab, select “Custom Formula”. Under the custom formula window, set the formula type to “Multiple row formula” as the PREV and NEXT function requires an additional argument specifying which column to sort by.