I am trying to write code in R for a dataset to check if DAYS column have consecutive numbers and print out the missing DAYS number, in such a way that, if the count of missing consecutive numbers between two rows of the DAYS column equals to that count+1 in the corresponding last row of the PERIOD column, exclude it from the output. For example, consider the two rows in DAYS column 163 and 165, where the count of missing number is 1. But in this case, the last row (where DAYS is 165) has PERIOD value of 2, that is (count+1). So, exclude this missing value (164) from the output. However if you look at DAYS 170 and 172,y you can see 172 has PERIOD value of 1 (not 2 or count+1). So, show this output (171).
Here is the first 28 rows of the dataset.
DAYS PERIOD
146 1
147 1
148 1
149 1
150 1
151 1
152 1
153 1
154 1
155 1
156 1
157 1
158 1
159 1
160 1
161 1
162 1
163 1
165 2
166 1
167 1
168 1
169 1
170 1
172 1
173 1
174 1
175 1
I tried
First, created a sequence of expected DAYS
values
expected_days <- seq(min(hs$DAYS), max(hs$DAYS))
Then, find the missing DAYS
values
missing_days <- setdiff(expected_days, hs$DAYS)
How to do the next bit?
I've managed to do this using tidyverse tools:
I've tweaked your data slightly to show that the solution can handle longer runs of missing days.
library(vroom)
library(dplyr)
library(tidyr)
test <-
vroom(
I(
"days period
161 1
162 1
163 1
166 3
167 1
168 1
169 1
170 1
172 1
"),
col_types = c("ii"))
all_days <- min(test[["days"]]):max(test[["days"]])
frame <- tibble(days = all_days)
test <-
right_join(test, frame, by = "days") |>
arrange(days)
test
#> # A tibble: 12 × 2
#> days period
#> <int> <int>
#> 1 161 1
#> 2 162 1
#> 3 163 1
#> 4 164 NA
#> 5 165 NA
#> 6 166 3
#> 7 167 1
#> 8 168 1
#> 9 169 1
#> 10 170 1
#> 11 171 NA
#> 12 172 1
test <-
mutate(test,
no_na = xor(is.na(period), is.na(lag(period))),
missingness_group = cumsum(no_na)) |>
select(-no_na)
test <-
group_by(test, missingness_group) |>
mutate(missing_days =
case_when(
all(is.na(period)) ~ n(),
TRUE ~ 0)) |>
ungroup() |>
select(-missingness_group)
test
#> # A tibble: 12 × 3
#> days period missing_days
#> <int> <int> <dbl>
#> 1 161 1 0
#> 2 162 1 0
#> 3 163 1 0
#> 4 164 NA 2
#> 5 165 NA 2
#> 6 166 3 0
#> 7 167 1 0
#> 8 168 1 0
#> 9 169 1 0
#> 10 170 1 0
#> 11 171 NA 1
#> 12 172 1 0
test <- mutate(test, extra_days = period - 1)
test <- fill(test, extra_days, .direction = "up")
test <-
filter(test, !is.na(period) | missing_days > extra_days) |>
select(days, period)
test
#> # A tibble: 10 × 2
#> days period
#> <int> <int>
#> 1 161 1
#> 2 162 1
#> 3 163 1
#> 4 166 3
#> 5 167 1
#> 6 168 1
#> 7 169 1
#> 8 170 1
#> 9 171 NA
#> 10 172 1
Created on 2023-06-01 with reprex v2.0.2