I have a df similar to the following:
df <- data.frame(name = c("billy","billy","sarah","sarah","sarah","sarah","sarah","sarah","sarah","linda","linda","linda","linda","linda","linda"),
data = c("soccer","8-01-1992","basketball","soccer","10-19-1995","10-25-1995","basketball","11-24-1995",6,"hockey","soccer","basketball","12-21-2001","12-30-2001","1-19-2002"),
event = c("joins.project","joins.when","joins.project","joins.project1","joins.when","joins.when1","participation.project","participation.when","participation.repetitions","joins.project","joins.project1","joins.project2","joins.when","joins.when1","joins.when2"))
> df
name data event
1 billy soccer joins.project
2 billy 8-01-1992 joins.when
3 sarah basketball joins.project
4 sarah soccer joins.project1
5 sarah 10-19-1995 joins.when
6 sarah 10-25-1995 joins.when1
7 sarah basketball participation.project
8 sarah 11-24-1995 participation.when
9 sarah 6 participation.repetitions
10 linda hockey joins.project
11 linda soccer joins.project1
12 linda basketball joins.project2
13 linda 12-21-2001 joins.when
14 linda 12-30-2001 joins.when1
15 linda 1-19-2002 joins.when2
My desired output is below.
new_df <- data.frame(name = c("billy", "sarah", "sarah", "sarah", "linda", "linda", "linda"),
join_or_particip = c("join", "join", "join", "participate", "join", "join", "join"),
sport = c("soccer", "basketball", "soccer", "basketball", "hockey", "soccer", "basketball"),
when = c("8-01-1992", "10-19-1995", "10-25-1995", "11-24-1995", "12-21-2001", "12-30-2001", "1-19-2002"),
repetitions = c(NA, NA, NA, 6, NA, NA, NA))
new_df
name join_or_particip sport when repetitions
1 billy join soccer 8-01-1992 NA
2 sarah join basketball 10-19-1995 NA
3 sarah join soccer 10-25-1995 NA
4 sarah participate basketball 11-24-1995 6
5 linda join hockey 12-21-2001 NA
6 linda join soccer 12-30-2001 NA
7 linda join basketball 1-19-2002 NA
I imagine I need to use spread()
to achieve this but I'm not sure how to account for the fact that certain rows in df
are linked (i.e., "joins.project1" and "joins.when1" go together). A bit of further context: this is a dataset of people who "joined" or "participated" in between 1 and 200+ "sports" on different dates. If they "participated" there is an additional variable called "repetitions" (i.e., how many times they participated).
dplyr
way
library(dplyr)
library(tidyr)
library(stringr)
df %>%
separate(event, into = c("join_or_particip", "type"), sep = "\\.") %>%
mutate(num = as.numeric(gsub('[^0-9.-]', '', type)) %>% replace_na(., 0),
type = gsub('[0-9]+', '', type)) %>%
pivot_wider(id_cols = c(name, join_or_particip, num), names_from = type, values_from = data) %>%
select(-num)
name join_or_particip project when repetitions
<chr> <chr> <chr> <chr> <chr>
1 billy joins soccer 8-01-1992 NA
2 sarah joins basketball 10-19-1995 NA
3 sarah joins soccer 10-25-1995 NA
4 sarah participation basketball 11-24-1995 6
5 linda joins hockey 12-21-2001 NA
6 linda joins soccer 12-30-2001 NA
7 linda joins basketball 1-19-2002 NA