Search code examples
rrowrow-number

Extracting data by row number after a set condition


I have a data.frame that is imported from an excel file which has been used with an irregular structure to make it visually appealing but the data is not usable. It is in repeated blocks of grouped data, with the word "Week" marking a new entry. I am creating a code to extract the relevant data. Here is a mwe

df = data.frame(x1 = c("Week", "Day", "Exercise", NA, NA, "Walk","Week", "Day", "Exercise", NA, NA, "Run"),
                x2 = c("1", "1",NA, "Advice", NA,NA,"1", "2",NA, "Advice", NA,NA) )
df
                x1     x2
1      Week      1
2       Day      1
3  Exercise   <NA>
4      <NA> Advice
5      <NA>   <NA>
6      Walk   <NA>
7      Week      1
8       Day      2
9  Exercise   <NA>
10     <NA> Advice
11     <NA>   <NA>
12      Run   <NA>

First I want to create avariable of "Week" and "Day" which will apply to the corresponding entry:

df = df%>%
  mutate(Week = case_when(x1 == "Week" ~ x2 ),
         Day =  case_when(x1 == "Day" ~ x2))%>%
  fill(c(Week, Day), .direction= "downup") # fill missing values (NA) with the preceding present value 

df
         x1     x2 Week Day
1      Week      1    1   1
2       Day      1    1   1
3  Exercise   <NA>    1   1
4      <NA> Advice    1   1
5      <NA>   <NA>    1   1
6      Walk   <NA>    1   1
7      Week      1    1   1
8       Day      2    1   2
9  Exercise   <NA>    1   2
10     <NA> Advice    1   2
11     <NA>   <NA>    1   2
12      Run   <NA>    1   2

Then I want to extract the Exercise that was done, which is always 3 rows below the word "Exercise" in x1.

The result should look like this

x1       x2     Week  Day   Exercise
   <fct>    <fct>  <fct> <fct> <fct>   
 1 Week     1      1     1     Walk    
 2 Day      1      1     1     Walk    
 3 Exercise NA     1     1     Walk    
 4 NA       Advice 1     1     Walk    
 5 NA       NA     1     1     Walk    
 6 Walk     NA     1     1     Walk    
 7 Week     1      1     1     Walk    
 8 Day      2      1     2     Run     
 9 Exercise NA     1     2     Run     
10 NA       Advice 1     2     Run     
11 NA       NA     1     2     Run     
12 Run      NA     1     2     Run  

How an I specify the row number after a condition and extract the data from a specified column in that row?


Solution

  • I like dplyr solutions and after searching found the function nth:

    df =df%>%
      group_by(Week, Day)%>%
      mutate(Exercise = nth(x1,(which(str_detect(x1, "Exercise"))+3)))
    

    which numbers the row where str_detect finds "Exercise". +3 to move on 3 nth can be used to find the data in that row number in x1