Search code examples

Mean number of kids by mom's age?

DF example

So I have this DF showing moms' ages & the b3_01-b3_10 columns correspond to the kids birthdates (converted to something called the "century month code" - not important here). Anyways, I'm trying to calculate the mean number of kids by mom's age. So for example, the 37 year old mom in line 2 has 2 kids, since there are values in the b3_01 and b3_02 columns.

The desired output is 1 column indicating the moms' ages and a 2nd column with the mean number of children.

This is what I have so far (I don't know what this code actually calculates tbh), but I what I really want is the mean number of kids by mom's age:

> Kids_Mean <- PHBS18 %>%
>         select(mom_age, b3_01, b3_02, b3_03, b3_04, b3_05, b3_06, b3_07, b3_08, b3_09, b3_10) %>%
>         group_by(mom_age) %>%
>         add_count(mom_age, b3_01, b3_02, b3_03, b3_04, b3_05, b3_06, b3_07, b3_08, b3_09, b3_10)
structure(list(case_id = 1:20, person_id = c(1, 2, 2, 2, 2, 1, 
1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), household_id = c(1, 
6, 7, 19, 27, 30, 31, 33, 36, 42, 44, 45, 46, 47, 50, 52, 54, 
59, 63, 64), year = c(2018, 2018, 2018, 2018, 2018, 2018, 2018, 
2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 
2018, 2018), month = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1), survey_date_cmc = c(1417, 1417, 1417, 1417, 
1417, 1417, 1417, 1417, 1417, 1417, 1417, 1417, 1417, 1417, 1417, 
1417, 1417, 1417, 1417, 1417), mom_age = c(28, 37, 59, 31, 45, 
61, 46, 61, 37, 54, 38, 26, 63, 64, 58, 29, 61, 56, 49, 32), 
    mom_dob_cmc = c(1081, 973, 709, 1045, 877, 685, 865, 685, 
    973, 769, 961, 1105, 661, 649, 721, 1069, 685, 745, 829, 
    1033), b3_01 = c(NA, 1297, 1189, 1405, 1297, NA, 1321, NA, 
    1345, NA, 1321, 1381, NA, NA, NA, NA, NA, NA, NA, 1405), 
    b3_02 = c(NA, 1297, NA, 1393, 1225, NA, 1297, NA, 1285, NA, 
    1249, 1333, NA, NA, NA, NA, NA, NA, 1105, 1393), b3_03 = c(NA, 
    NA, NA, NA, 1189, NA, 1201, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA), b3_04 = c(NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1081, NA), b3_05 = c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_), b3_06 = c(NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_), b3_07 = c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_), b3_08 = c(NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_), b3_09 = c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_), b3_10 = c(NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_)), row.names = c(NA, 
-20L), class = c("tbl_df", "tbl", "data.frame")) 


  • I think this does it:

    df %>% 
      pivot_longer(starts_with("b3")) %>%  # Move from wide to long format
      group_by(mom_age, case_id) %>%  # Group by each case, keeping mom age for later
      summarise(n_kids = sum(! %>%  # Find the number of kids in each case
      summarise(mean_kids = mean(n_kids))  # Find mean for each mom age