I am cleaning a big dataset, that contains dates (t0:t5 and no_BL:no_48) and test scores (MMSE_BL: MMSE_FU48).
LONG STORY SHORT, I need this date: "27-09-12", to look like this "27-09-12_FU24_28",
Longer explanation: In this example, I am showing my first 2 rows and 18 columns. Probably the best way to tell you my goal is to show how I would like the outcome to look like:
### What I have
cnames <- c("t0", "t1", "t2", "t3", "t4", "t5", "no_BL", "no_FU6", "no_Fu12", "no_FU24", "no_FU36", "no_FU48", "MMSE_BL","MMSE_FU6" ,"MMSE_FU12", "MMSE_FU24", "MMSE_36", "MMSE_FU48")
one <- c("27-09-12", "NA", "25-07-13", "NA", "NA", "NA", "NA", "NA", "NA", "27-09-12", "25-07-13", "NA", "29", "NA", "28", "28", "29", "30")
two <- c("06-11-12", "NA", "01-10-13", "NA", "NA", "NA", "NA", "NA", "NA", "06-11-12", "01-10-13", "NA", "30", "NA", "29", "30", "30", "30")
df <- matrix(c(one, two), ncol = 18, byrow = TRUE )
colnames(df) <- c(cnames)
df <- as.data.frame(df)
View(df)
##### What I need
#OUTCOME
one_ <- c("27-09-12_FU24_28", "NA", "25-07-13_FU36_29", "NA", "NA", "NA", "NA", "NA", "NA", "27-09-12", "25-07-13", "NA", "29", "NA", "28", "28", "29", "30")
two_ <- c("06-11-12_FU24_30", "NA", "01-10-13_FU36_30", "NA", "NA", "NA", "NA", "NA", "NA", "06-11-12", "01-10-13", "NA", "NA" ,"30", "29", "30", "30", "30")
outcome <- matrix(c(one_, two_), ncol = 18, byrow = TRUE )
colnames(outcome) <- c(cnames)
outcome <- as.data.frame(outcome)
View(outcome)
It is a 2 step proccess,
First I would like to take the dates contained in columns t0:t5, go through columns no_BL:no_48, find their date match, and attach to the dates from t0:t5 columns the last characters of its date-belonging header.
Second after all the first 6 columns are settled, I have to start an iteration again, but this time attaching the scores to the end of the dates.
I was also contemplating the idea of Header=False, and work with it as a regular row.
Please help! and thank you in advance : )
I agree with Ronak's both points here. Though Ronak has proposed a fantastic answer which I can never imagine, yet I worked on your problem and so I propose alternate solution which is nowhere as good as that answer. It works
df %>% mutate(i = row_number()) %>%
pivot_longer(cols = contains("t"), names_to = "n", values_to = "v") %>%
pivot_longer(cols = contains("MM")) %>%
filter(!is.na(v)) %>%
rowwise() %>%
mutate(new = names(.)[which(c_across(1:6) == v)],
new = str_remove(new, "no_"),
new2 = case_when(str_detect(name, new) ~ value,
TRUE ~ NA_character_),
v = paste(v, new, new2, sep = "_")) %>%
filter(!is.na(new2)) %>%
select(i, n, v) %>%
pivot_wider(names_from = n, values_from = v) %>%
left_join(df[-c(1:6)] %>% mutate(i = row_number()), by = "i") %>%
select(-i)
# A tibble: 2 x 14
t0 t2 no_BL no_FU6 no_Fu12 no_FU24 no_FU36 no_FU48 MMSE_BL MMSE_FU6 MMSE_FU12 MMSE_FU24 MMSE_FU36
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 27-0~ 25-0~ NA NA NA 27-09-~ 25-07-~ NA 29 NA 28 28 29
2 06-1~ 01-1~ NA NA NA 06-11-~ 01-10-~ NA 30 NA 29 30 30
# ... with 1 more variable: MMSE_FU48 <chr>
Note Whenver there is sufficient data, the t_cols will automatically find a place in this syntax
modified df
> dput(df)
structure(list(t0 = c("27-09-12", "06-11-12"), t1 = c(NA_character_,
NA_character_), t2 = c("25-07-13", "01-10-13"), t3 = c(NA_character_,
NA_character_), t4 = c(NA_character_, NA_character_), t5 = c(NA_character_,
NA_character_), no_BL = c(NA_character_, NA_character_), no_FU6 = c(NA_character_,
NA_character_), no_Fu12 = c(NA_character_, NA_character_), no_FU24 = c("27-09-12",
"06-11-12"), no_FU36 = c("25-07-13", "01-10-13"), no_FU48 = c(NA_character_,
NA_character_), MMSE_BL = c("29", "30"), MMSE_FU6 = c(NA_character_,
NA_character_), MMSE_FU12 = c("28", "29"), MMSE_FU24 = c("28",
"30"), MMSE_FU36 = c("29", "30"), MMSE_FU48 = c("30", "30")), class = "data.frame", row.names = c(NA,
-2L))