I'm a starter in R, I already developed some programs, but the issue that I will expose you never happened to me yet. Here is the TSE dataframe I'm dealing with :
ID TIME EVENT
1 150 1 A
2 150 2 B
3 150 2 C
4 150 2 D
5 151 1 C
6 151 2 A
7 151 3 B
8 151 3 D
This dataframe contains 3 variables :
ID : Id of the person,
TIME : Time index,
EVENT: An event that occurs at a certain moment of time.
I want to drop row(s) for which two or more events occur at the same time value (TIME) based on a rule. Let's suppose the rule is : C>B>A>D (where ">" means preferable)
So, in my example, I would like to keep only these rows :
ID TIME EVENT
1 150 1 A
3 150 2 C
5 151 1 C
6 151 2 A
7 151 3 B
You can easily see that rows 2,4,8 vanished because of the defined rule
I guess this shouldn't be so tricky to program but I really have no clue on how to put it down.
Thanks you all in anticipation.
Jérémie P.
Here's a possible solution using dplyr
.
First reproduce your data
DF <- data.frame(ID = rep(150:151, each=4),
time=c(1, 2, 2, 2, 1, 2, 3, 3),
EVENT=c("A", "B", "C", "D", "C", "A", "B", "D"))
target_rule <- c("C", "B", "A", "D")
Then we can use a combination of commands from dplyr
to order, select, etc.
Below I use a factor version of your EVENT
to sort them according to your taget rule.
library("dplyr")
DF %>%
group_by(ID, time) %>% # Consider each combo of ID and time
mutate(fevent=factor(EVENT, levels=target_rule)) %>% # Create ordered version of EVENT
arrange(fevent) %>% # Sort according to rule
summarise(EVENT=first(EVENT)) %>% # Pick just the first
ungroup() %>%
arrange(ID)
This produces
# A tibble: 5 x 5
ID time EVENT fevent rn
<int> <dbl> <fct> <fct> <int>
1 150 1 A A 1
2 150 2 C C 1
3 151 1 C C 1
4 151 2 A A 1
5 151 3 B B 1