How to replace an entire row between two rows based on a column

I am dealing a with a very large mRNA splicing dataset. Here is a toy dataset to exemplify the problem:

test_df <- data.frame(
  start = c(2, 9, 13, 19, 13, 20, 25, 35, 39),
  end = c(8, 12, 18, 24, 16, 24, 30, 38, 45),
  gene_id = c("A", "A", "A", "A", "A", "B", "B", "B", "B"),
  exon_identity = c(NA, "Upstream", NA, "Downstream", "Event", NA, "Upstream", "Downstream", NA)
)

> test_df
  start end gene_id exon_identity
1     2   8       A          <NA>
2     9  12       A      Upstream
3    13  18       A          <NA>
4    19  24       A    Downstream
5    13  16       A         Event
6    20  24       B          <NA>
7    25  30       B      Upstream
8    35  38       B    Downstream
9    39  45       B          <NA>

For every unique value in gene_id column, I would like to replace an entire row if it is present between "Upstream" and "Downstream" values in the exon_identity column i.e. replace row 3 with row 5. What makes it difficult for me is that there are certain genes in the gene_id column which do not have a row that needs to be replaced, e.g. "B" in the gene_id column.

This question goes in the direction of previously asked questions here and here.

Based on those and other resources, I have tried:

library(tidyverse)

test_replace <- test_df %>% 
  group_by(gene_id) %>% 
  mutate(start = replace(start, row_number() > which(exon_idnetity == "Upstream") & row_number() < which(exon_idnetity == "Downstream"), start[exon_idnetity == "Event"]),
         end = replace(end, row_number() > which(exon_idnetity == "Upstream") & row_number() < which(exon_idnetity == "Downstream"), end[exon_idnetity == "Event"]),
         exon_idnetity = replace(exon_idnetity, row_number() > which(exon_idnetity == "Upstream") & row_number() < which(exon_idnetity == "Downstream"), "Event")
         )


Warning message:
There were 2 warnings in `mutate()`.
The first warning was:
ℹ In argument: `start = replace(...)`.
ℹ In group 1: `gene_id = "A"`.
Caused by warning in `x[list] <- values`:
! number of items to replace is not a multiple of replacement length
ℹ Run dplyr::last_dplyr_warnings() to see the 1 remaining warning. 
> 
> test_replace
# A tibble: 9 × 4
# Groups:   gene_id [2]
  start   end gene_id exon_idnetity
  <dbl> <dbl> <chr>   <chr>        
1     2     8 A       NA           
2     9    12 A       Upstream     
3    NA    NA A       Event        
4    19    24 A       Downstream   
5    13    16 A       Event        
6    20    24 B       NA           
7    25    30 B       Upstream     
8    35    38 B       Downstream   
9    39    45 B       NA

Desired output:


> desired_outcome 
  start end gene_id exon_idnetity
1     2   8       A          <NA>
2     9  12       A      Upstream
3    13  16       A         Event
4    19  24       A    Downstream
5    20  24       B          <NA>
6    25  30       B      Upstream
7    35  38       B    Downstream
8    39  45       B          <NA>

A solution, preferably using tidyverse package would be highly appreciated.

Thank you!

Solution

In the toy example, reordering your data set gives you almost all of what you want. Will that work in the real data set? E.g.

library(tidyverse)
test_df |>
  mutate(
    sandwich = lag(exon_identity == 'Upstream') & lead(exon_identity == 'Downstream')
  ) |>
  replace_na(list(sandwich = FALSE)) |>
  group_by(gene_id) |>
  arrange(start) |>
  ungroup() |>
  filter(!sandwich) |>
  select(-sandwich)

(In the toy example, group_by and ungroup are not needed. I added them in case it was needed/useful in the real data set.)