I have data on repeated measurements of 8 patients, each with varying amount of repeated measurements on the same variables. The measured variables are sex, blood pressure (sys_bp), and how many CT scans a person underwent:
library(dplyr)
library(magrittr)
questiondata <- structure(list(id = c(2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4,
4, 7, 7, 8, 8, 8, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 20,
20, 20), time = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 5L,
1L, 2L, 3L, 4L, 5L, 1L, 6L, 1L, 2L, 5L, 1L, 2L, 3L, 4L, 5L, 1L,
2L, 3L, 4L, 5L, 1L, 2L, 4L), .Label = c("T0", "T1M0", "T1M6",
"T1M12", "T2M0", "FU1"), class = "factor"), sys_bp = c(116, 125.8,
NA, NA, NA, 113.2, NA, NA, NA, NA, 146, NA, NA, NA, NA, NA, NA,
125, NA, NA, 164.5, NA, NA, NA, NA, 150.5, NA, NA, NA, NA, 158,
NA), sex = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 1L), .Label = c("female", "male"), class = "factor"),
ct_amount = c(4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 2L, 2L, 3L, 3L, 3L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 3L, 3L, 3L)), row.names = c(NA, -32L), class = c("tbl_df",
"tbl", "data.frame"))
questiondata
id time sys_bp sex ct_amount
<dbl> <fct> <dbl> <fct> <int>
1 2 T0 116 female 4
2 2 T1M0 126. female 4
3 2 T1M6 NA female 4
4 2 T1M12 NA female 4
5 3 T0 NA female 5
6 3 T1M0 113. female 5
7 3 T1M6 NA female 5
8 3 T1M12 NA female 5
9 3 T2M0 NA female 5
10 4 T0 NA male 5
11 4 T1M0 146 male 5
12 4 T1M6 NA male 5
13 4 T1M12 NA male 5
14 4 T2M0 NA male 5
15 7 T0 NA female 2
16 7 FU1 NA female 2
17 8 T0 NA female 3
18 8 T1M0 125 female 3
19 8 T2M0 NA female 3
20 13 T0 NA female 5
21 13 T1M0 164. female 5
22 13 T1M6 NA female 5
23 13 T1M12 NA female 5
24 13 T2M0 NA female 5
25 14 T0 NA male 5
26 14 T1M0 150. male 5
27 14 T1M6 NA male 5
28 14 T1M12 NA male 5
29 14 T2M0 NA male 5
30 20 T0 NA female 3
31 20 T1M0 158 female 3
32 20 T1M12 NA female 3
I am trying to count the number of persons that (1) is male/female (2) has 1/2/3/4/5 CT scans.
So the output would be that there are (1) 6 females and 2 males, and (2) 1 person with 2 CTs, 2 persons with 3 CTs, 1 person with 4 CTs and 4 persons with 5 CTs.
I've tried many combinations of group_by
and summarise
and count
, but can't seem to get it right. Any help?
You can first keep only the unique rows for each id
. Then use count
to get the output.
library(dplyr)
unique_data <- questiondata %>% distinct(id, .keep_all = TRUE)
unique_data %>% count(sex)
# A tibble: 2 x 2
# sex n
# <fct> <int>
#1 female 6
#2 male 2
unique_data %>% count(ct_amount)
# A tibble: 4 x 2
# ct_amount n
# <int> <int>
#1 2 1
#2 3 2
#3 4 1
#4 5 4