I have a tibble with one row per observation. The columns have variables such as ID number, DOB and test results
ID | DOB | result |
a | 1940-01-01 | 15 |
a | 1940-01-01 | 17 |
b | 1933-05-20 | 11 |
b | 1933-05-20 | 20 |
I want to make a histogram of the age of the patients but I can only get the histogram to show every occurence of the DOB, so I have n = patients * observations per patients data instead of n= patients.
I tried:
ggplot(d1, aes(eeptools::age_calc(dob = as.Date(DOB), enddate = Sys.Date(), units = 'years'))) + geom_histogram(binwidth = 1)
How do I subset so I only get one DOB for each ID? Thanks!
If you are not interested in the results column, then you could simply drop it by using subset
and then use the function distinct
to remove all duplicates. I am a bit unsure of your years (is it years or year of birth?), but using years as age since today, I got this:
# Import packages
# Make dataframe
df <- data.frame(ID = c("a", "a", "b", "b"),
DOB = c("1940-01-01", "1940-01-01", "1933-05-20", "1933-05-20"),
result = c(15, 17, 11, 20))
#Mutate date to correct class - it most likely already is in your example
df %>% mutate(date = as.Date(DOB),
years = lubridate::year(date),
age = 2023 - years) %>%
# Subset data to remove results
subset(select = - result) %>%
# Remove duplicates using distinct
distinct() %>%
# Plot
ggplot(aes(x=age,)) +
geom_histogram(bins = 2)