For a reporting purposes, I've got a data frame which is defined like this:
Data:
V df_ischemia 12 obs. of 2 variables
record_id : 'labelled' chr "1001" "1001" "1001" "1001" "1002" ...
..- attr(*, "label")= chr "Patient number"
ischemic: Factor w/ 2 levels "Unchecked","Checked": NA NA 1 1 NA 2 NA 1 NA 2 ...
..- attr(, "redcapLabels")= chr [1:2] "Unchecked" "Checked"
..- attr(, "redcapLevels")= int [1:2] 0 1
..- attr(, "label")= chr "Complication(s): Ischemia"
The real data frame has a couple of hundred rows, but for this example let's say it's got just 12 rows like this:
| record_id | ischemic
1 | 1001 | NA
2 | 1001 | NA
3 | 1001 | Unchecked
4 | 1001 | Unchecked
5 | 1002 | NA
6 | 1002 | Checked
7 | 1003 | NA
8 | 1003 | Unchecked
9 | 1004 | NA
10 | 1004 | Checked
11 | 1004 | Checked
12 | 1004 | Checked
And the goal is to group it for patients with a 'Checked' value, so the expected output should be like this:
| record_id | ischemic
1 | 1002 | Checked
2 | 1004 | Checked
I figured just use group_by
and max
df_ischemia <- group_by(record_id) %>% max(df_ischemia$ischemic)
# Error object 'record_id' not found
df_ischemia <- group_by(df_ischemia$record_id) %>% max(ischemic)
# no applicable method for 'group_by_' applied to an object of class "c('labelled', 'character')"
df_ischemia <- group_by(record_id) %>% summarise(df_ischemia$ischemic=max(df_ischemia$ischemic))
# Error: unexpected '=' ..
But that doesn't work, however the factor does have int values so a max should be possible(?). I read somewhere that the factor should be ordered. It looks like it's orderd, but no clue how to check if that is the case, or how to set the order of an existing factor.
We need summarise
in the first case
library(dplyr)
df_comp_lrcsp %>%
group_by(record_id) %>%
summarise(Max = comp_lrcsp___1[which.max(as.integer(comp_lrcsp___1))]) )
The <-
is at the wrong place i.e. the group_by
is applied on a column 'record_id' without specifying the data 'df_comp_lrcsp', after grouping, the max
is done on the full column 'comp_lrscp__1'. Also, the extraction may not work with function applied on top of it as well because of the chain.
In the second code, the same issue without the data and the max
applied without summarise
. In the last, we have the 'data' not found issue along with the extraction of the full column. $
extracts the full column breaking the grouping