Search code examples
rmultiple-columnssummarizemultiple-conditions

How to create two columns that count the total number of two conditions


I have a diabetes dataset that has a column called Outcome and only has two values, 1 = Diabetes, 0 = Non-Diabetes. I want to count the total number of 1's and 0's based on age and then have a % of 1's based on age.

I have this code below:

by_age1 <- 
  diabetes.df %>%
  select(Age, Outcome) %>%
  group_by(Age,Outcome) %>%
  summarize(Diabetes_Count = n()) %>%
  filter(Outcome=="1"| Outcome == "0")

This code generates this table

Age | Outcome | Count
21      0        58
21      1         5    

And so on

I want the table to look like this though

Age | Count_Outcome=1 | Count_Outcome=0
21          5                 58
22          11                61

So I can eventually get to this

Age | Count_Outcome=1 | Count_Outcome=0 | Count_Outcome=1/Count_Outcome=0
21          5                 58                    0.086
22          11                61                    0.180

Here is the dataset

Rows: 768
Columns: 23
$ Pregnancies              <int> 6, 1, 8, 1, 0, 5, 3, 10, 2, 8, 4, 10, 10, 1, 5, 7, 0, 7, 1, 1, 3, 8, 7, 9, 11, 10, 7, 1, 13, 5, 5, 3, ...
$ Glucose                  <int> 148, 85, 183, 89, 137, 116, 78, 115, 197, 125, 110, 168, 139, 189, 166, 100, 118, 107, 103, 115, 126, ...
$ BloodPressure            <int> 72, 66, 64, 66, 40, 74, 50, 0, 70, 96, 92, 74, 80, 60, 72, 0, 84, 74, 30, 70, 88, 84, 90, 80, 94, 70, ...
$ SkinThickness            <int> 35, 29, 0, 23, 35, 0, 32, 0, 45, 0, 0, 0, 0, 23, 19, 0, 47, 0, 38, 30, 41, 0, 0, 35, 33, 26, 0, 15, 19...
$ Insulin                  <int> 0, 0, 0, 94, 168, 0, 88, 0, 543, 0, 0, 0, 0, 846, 175, 0, 230, 0, 83, 96, 235, 0, 0, 0, 146, 115, 0, 1...
$ BMI                      <dbl> 33.6, 26.6, 23.3, 28.1, 43.1, 25.6, 31.0, 35.3, 30.5, 0.0, 37.6, 38.0, 27.1, 30.1, 25.8, 30.0, 45.8, 2...
$ DiabetesPedigreeFunction <dbl> 0.627, 0.351, 0.672, 0.167, 2.288, 0.201, 0.248, 0.134, 0.158, 0.232, 0.191, 0.537, 1.441, 0.398, 0.58...
$ Age                      <int> 50, 31, 32, 21, 33, 30, 26, 29, 53, 54, 30, 34, 57, 59, 51, 32, 31, 31, 33, 32, 27, 50, 41, 29, 51, 41...
$ Outcome                  <int> 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, ...
$ Skin.log                 <dbl> 3.555634, 3.367641, -4.605170, 3.135929, 3.555634, -4.605170, 3.466048, -4.605170, 3.806885, -4.605170...
$ Insulin.log              <dbl> -2.302585, -2.302585, -2.302585, 4.544358, 5.124559, -2.302585, 4.478473, -2.302585, 6.297293, -2.3025...
$ DPF.log                  <dbl> -0.46680874, -1.04696906, -0.39749694, -1.78976147, 0.82767807, -1.60445037, -1.39432653, -2.00991548,...
$ Preg.log                 <dbl> 1.793424749, 0.009950331, 2.080690761, 0.009950331, -4.605170186, 1.611435915, 1.101940079, 2.30358459...
$ Age.log                  <dbl> 3.912023, 3.433987, 3.465736, 3.044522, 3.496508, 3.401197, 3.258097, 3.367296, 3.970292, 3.988984, 3....
$ G                        <dbl> 0.84777132, -1.12266474, 1.94245802, -0.99755769, 0.50372693, -0.15308509, -1.34160209, -0.18436186, 2...
$ BP                       <dbl> 0.14954330, -0.16044119, -0.26376935, -0.16044119, -1.50370731, 0.25287146, -0.98706650, -3.57027057, ...
$ S                        <dbl> 0.7143403, 0.6624894, -1.5365134, 0.5985804, 0.7143403, -1.5365134, 0.6896315, -1.5365134, 0.7836385, ...
$ I                        <dbl> -1.0157459, -1.0157459, -1.0157459, 0.8904827, 1.0520140, -1.0157459, 0.8721398, -1.0157459, 1.3785101...
$ D                        <dbl> 0.76534970, -0.13507072, 0.87292300, -1.28789940, 2.77441913, -1.00029287, -0.67417647, -1.62958283, -...
$ BM                       <dbl> 0.20387991, -0.68397621, -1.10253696, -0.49372133, 1.40882750, -0.81081280, -0.12589522, 0.41950211, -...
$ P                        <dbl> 0.6504082, -0.1684863, 0.7823084, -0.1684863, -2.2875506, 0.5668468, 0.3329083, 0.8846516, 0.1474983, ...
$ A                        <dbl> 1.43544387, -0.04590939, 0.05247453, -1.25279578, 0.14783077, -0.14751959, -0.59096525, -0.25257485, 1...
$ Segment                  <int> 4, 3, 2, 3, 5, 2, 3, 1, 4, 2, 2, 2, 2, 4, 4, 1, 5, 2, 3, 3, 4, 2, 2, 3, 4, 4, 2, 3, 4, 2, 4, 4, 3, 2, ...
``

Solution

  • Random data:

    r <- function(x) {rnorm(x, 50, 2)}
    set.seed(123)
    diabetes.df <- data.frame(Age = round(r(10)), Outcome = as.character((r(10) < 50)*1))
    
    > diabetes.df
       Age Outcome
    1   49       0
    2   50       0
    3   53       0
    4   50       0
    5   50       1
    6   53       0
    7   51       0
    8   47       1
    9   49       0
    10  49       1
    

    Then pivot_wider() will do what you want:

    df <- diabetes.df %>%
      select(Age, Outcome) %>%
      group_by(Age,Outcome) %>%
      dplyr::summarize(Diabetes_Count = n()) %>%
      filter(Outcome=="1"| Outcome == "0")
    
    df = pivot_wider(df, names_from = c("Outcome"), values_from = "Diabetes_Count", names_prefix = "Outcome_", values_fill = 0)
    
    > df
    # A tibble: 5 x 3
    # Groups:   Age [5]
        Age Outcome_1 Outcome_0
      <dbl>     <int>     <int>
    1    47         1         0
    2    49         1         2
    3    50         1         2
    4    51         0         1
    5    53         0         2
    
    > df %>% mutate(`Outcome_1/Outcome_0` = Outcome_1 / Outcome_0)
    # A tibble: 5 x 4
    # Groups:   Age [5]
        Age         Outcome_1         Outcome_0  `Outcome_1/Outcome_0`
      <dbl>             <int>             <int>                 <dbl>
    1    47                 1                 0                 Inf  
    2    49                 1                 2                 0.5
    3    50                 1                 2                 0.5
    4    51                 0                 1                 0  
    5    53                 0                 2                 0