I have a data.frame that is imported from an excel file which has been used with an irregular structure to make it visually appealing but the data is not usable. It is in repeated blocks of grouped data, with the word "Week" marking a new entry. I am creating a code to extract the relevant data. Here is a mwe
df = data.frame(x1 = c("Week", "Day", "Exercise", NA, NA, "Walk","Week", "Day", "Exercise", NA, NA, "Run"),
x2 = c("1", "1",NA, "Advice", NA,NA,"1", "2",NA, "Advice", NA,NA) )
df
x1 x2
1 Week 1
2 Day 1
3 Exercise <NA>
4 <NA> Advice
5 <NA> <NA>
6 Walk <NA>
7 Week 1
8 Day 2
9 Exercise <NA>
10 <NA> Advice
11 <NA> <NA>
12 Run <NA>
First I want to create avariable of "Week" and "Day" which will apply to the corresponding entry:
df = df%>%
mutate(Week = case_when(x1 == "Week" ~ x2 ),
Day = case_when(x1 == "Day" ~ x2))%>%
fill(c(Week, Day), .direction= "downup") # fill missing values (NA) with the preceding present value
df
x1 x2 Week Day
1 Week 1 1 1
2 Day 1 1 1
3 Exercise <NA> 1 1
4 <NA> Advice 1 1
5 <NA> <NA> 1 1
6 Walk <NA> 1 1
7 Week 1 1 1
8 Day 2 1 2
9 Exercise <NA> 1 2
10 <NA> Advice 1 2
11 <NA> <NA> 1 2
12 Run <NA> 1 2
Then I want to extract the Exercise that was done, which is always 3 rows below the word "Exercise" in x1
.
The result should look like this
x1 x2 Week Day Exercise
<fct> <fct> <fct> <fct> <fct>
1 Week 1 1 1 Walk
2 Day 1 1 1 Walk
3 Exercise NA 1 1 Walk
4 NA Advice 1 1 Walk
5 NA NA 1 1 Walk
6 Walk NA 1 1 Walk
7 Week 1 1 1 Walk
8 Day 2 1 2 Run
9 Exercise NA 1 2 Run
10 NA Advice 1 2 Run
11 NA NA 1 2 Run
12 Run NA 1 2 Run
How an I specify the row number after a condition and extract the data from a specified column in that row?
I like dplyr
solutions and after searching found the function nth
:
df =df%>%
group_by(Week, Day)%>%
mutate(Exercise = nth(x1,(which(str_detect(x1, "Exercise"))+3)))
which
numbers the row where str_detect
finds "Exercise". +3 to move on 3
nth
can be used to find the data in that row number in x1