Add missing rows to data frame in R based on a maximum length

I'm a bit stuck on how to approach this problem. I've looked into the complete() function, but I'm not sure it works for my problem.

I have a data frame that is missing some data points that I would like to fill in. Basically, in this task, a participant reads a sentence word-by-word and responds YES or NO to a question. When they respond NO, the sentence ends and there are no data points left. What I'd like to do is fill in those missing data points with NOs.

Here is a sample data frame:

test <- tibble(
  participant = c(001, 001, 001, 002, 002, 003, 003, 003, 003, 001, 002, 002, 003, 003, 003, 003, 003),
  sentence_id = c("dog_sentence", "dog_sentence", "dog_sentence", "dog_sentence", "dog_sentence", "dog_sentence", "dog_sentence", "dog_sentence", "dog_sentence", "plant_sentence", "plant_sentence", "plant_sentence", "plant_sentence", "plant_sentence", "plant_sentence", "plant_sentence", "plant_sentence"),
  word = c("the", "dog", "went", "the", "dog", "the", "dog", "went", "home.", "I", "I", "watered", "I", "watered", "my", "plants", "today."),
  word_position = c(1, 2, 3, 1, 2, 1, 2, 3, 4, 1, 1, 2, 1, 2, 3, 4, 5),
  max_length = c(4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5),
  response = c("YES", "YES", "NO", "YES", "NO", "YES", "YES", "YES", "YES", "NO", "YES", "NO", "YES", "YES", "YES", "YES", "YES")
)

Which looks like this:

participant	sentence_id	word	word_position	max_length	response
001	dog_sentence	the	1	4	YES
001	dog_sentence	dog	2	4	YES
001	dog_sentence	went	3	4	NO
002	dog_sentence	the	1	4	YES
002	dog_sentence	dog	2	4	NO
003	dog_sentence	the	1	4	YES
003	dog_sentence	dog	2	4	YES
003	dog_sentence	went	3	4	YES
003	dog_sentence	home.	4	4	YES
001	plant_sentence	I	1	5	NO
002	plant_sentence	I	1	5	YES
002	plant_sentence	watered	2	5	NO
003	plant_sentence	I	1	5	YES
003	plant_sentence	watered	2	5	YES
003	plant_sentence	my	3	5	YES
003	plant_sentence	plants	4	5	YES
003	plant_sentence	today.	5	5	YES

I'd like to fill in the missing data points whenever a participant says NO so that each data point is filled in between the word position at which they said NO and the max_length position. So, for participant 001's plant_sentence, there should be three rows filled in underneath that have word positions 2, 3, 4, and 5 and retain all of the other static information in columns 1-5 (bonus points if "response" can be filled in with NO for the new rows, but I can just do that after the fact if the new rows just say NA).

Thank you for your help in advance!

Solution

Something like this (using dplyr_1.1.0 or newer):

library(dplyr)
group_by(test, participant, sentence_id, max_length) %>%
  reframe(word_position = seq_len(first(max_length))) %>%
  full_join(test) %>%
  mutate(response = coalesce(response, "NO"))
# Joining with `by = join_by(participant, sentence_id, max_length, word_position)`
# # A tibble: 27 × 6
#    participant sentence_id    max_length word_position word  response
#          <dbl> <chr>               <dbl>         <dbl> <chr> <chr>   
#  1           1 dog_sentence            4             1 the   YES     
#  2           1 dog_sentence            4             2 dog   YES     
#  3           1 dog_sentence            4             3 went  NO      
#  4           1 dog_sentence            4             4 <NA>  NO      
#  5           1 plant_sentence          5             1 I     NO      
#  6           1 plant_sentence          5             2 <NA>  NO      
#  7           1 plant_sentence          5             3 <NA>  NO      
#  8           1 plant_sentence          5             4 <NA>  NO      
#  9           1 plant_sentence          5             5 <NA>  NO      
# 10           2 dog_sentence            4             1 the   YES     
# # ℹ 17 more rows
# # ℹ Use `print(n = ...)` to see more rows