r datetime group-by summarize data-wrangling

R: Grouping and summarizing data to find the nth nearest event after date of reference

I have been working with a dataset that contains the CLIENT (identification), date of creation of an account (DATE ENTER) and the GATE use for customers in order to place orders in a website (Table A), but some of these attempts fail and that why a second data frame (Table B) is provided with the CLIENT (identification), the ERROR_DATE and the MAGIC_IP that identifies the error as a side note all GATES and all MAGIC_IPs will always be different even if this implies the same customer identification.

Table A

GATE	CLIENT	DATE_ENTER
A	1	1
B	1	1
C	1	3
D	1	4
E	1	5
F	1	6
G	1	7
H	1	8
I	1	9
J	1	10
K	1	11
L	1	12
M	1	16

Table B

HOST	CLIENT	DATE_ERROR
14558556	1	2
14558557	1	3
14558558	1	4
14558559	1	5
14558560	1	6
14558561	1	7
14558562	3	8
14558563	2	9
14558564	3	10
14558565	3	11
14558566	1	12
14558567	1	13

What I want to create is a table where I can see for each "CLIENT","DATE ENTER" and "GATE" what was the ERROR_DATE and MAGIC_IP immediately after or in other words the very next one. The expected result should look like this:

GATE	CLIENT	DATE_ENTER	DATE_ERROR	MAGIC_IP
A	1	1	2	154
B	1	1	2	154
C	1	3	4	156
D	1	4	5	157
E	1	5	6	158
F	1	6	7	159
G	1	7	10	160
H	1	8	10	160
I	1	9	10	160
J	1	10	11	163
K	1	11	12	164
L	1	12	13	165
M	1	16	N.A	N.A

In this table I am able to see for each GATE. CLIENT AND "DATE_ENTER" what was the very next or the closest nearest MAGIC_IP and ERROR_DATE for each client, thank you so much to everyone willing to help me out,

What have I tried?

I was thinking about grouping by CLIENT,DATE_ENTER and GATE after left joining the two tables by CLIENT, and then create a variable that would subtract all the dates and pick for each group the smallest positive value as an index to correlate the information about MAGIC_IP and ERROR_DATE, but I cant seem to find a way to do so at the moment thank you for all of your help guys

Data:

TableA<-tribble(~GATE,  ~CLIENT,    ~DATE_ENTER,
"A",    "1",    "1",
"B",    "1",    "1",
"C",    "1",    "3",
"D",    "1",    "4",
"E",    "1",    "5",
"F",    "1",    "6",
"G",    "1",    "7",
"H",    "1",    "8",
"I",    "1",    "9",
"J",    "1",    "10",
"K",    "1",    "11",
"L",    "1",    "12")

TableB<-tribble(~HOST,  ~CLIENT,    ~DATE_ERROR,
                "14558556", "1",    "2",
                "14558557", "1",    "3",
                "14558558", "1",    "4",
                "14558559", "1",    "5",
                "14558560", "1",    "6",
                "14558561", "1",    "7",
                "14558562", "3",    "8",
                "14558563", "2",    "9",
                "14558564", "3",    "10",
                "14558565", "3",    "11",
                "14558566", "1",    "12",
                "14558567", "1",    "13")

Solution

Join TableA and TableB by CLIENT and select the row which is the smallest positive difference between DATE_ERROR and DATE_ENTER.

library(dplyr)

TableA %>%
  mutate(row = row_number()) %>%
  left_join(TableB, by = 'CLIENT') %>%
  type.convert(as.is = TRUE) %>%
  group_by(row) %>%
  filter({tmp = DATE_ERROR - DATE_ENTER
          tmp == min(tmp[tmp > 0], na.rm = TRUE)
          }) %>%
  ungroup() %>%
  select(-row)

GATE	CLIENT	DATE_ENTER	DATE_ERROR	MAGIC_IP
A	1	1	2	154
B	1	1	2	154
C	1	3	4	156
D	1	4	5	157
E	1	5	6	158
F	1	6	7	159
G	1	7	10	160
H	1	8	10	160
I	1	9	10	160
J	1	10	11	163
K	1	11	12	164
L	1	12	13	165
M	1	16	N.A	N.A

GATE	CLIENT	DATE_ENTER	DATE_ERROR	MAGIC_IP
A	1	1	2	154
B	1	1	2	154
C	1	3	4	156
D	1	4	5	157
E	1	5	6	158
F	1	6	7	159
G	1	7	10	160
H	1	8	10	160
I	1	9	10	160
J	1	10	11	163
K	1	11	12	164
L	1	12	13	165
M	1	16	N.A	N.A

GATE	CLIENT	DATE_ENTER	DATE_ERROR	MAGIC_IP
A	1	1	2	154
B	1	1	2	154
C	1	3	4	156
D	1	4	5	157
E	1	5	6	158
F	1	6	7	159
G	1	7	10	160
H	1	8	10	160
I	1	9	10	160
J	1	10	11	163
K	1	11	12	164
L	1	12	13	165
M	1	16	N.A	N.A