Search code examples
rdataframegroup-bytibble

Group by in R questions: Tibble, decimal places, and show sample size?


Here's my dataframe "fulldays" (The column header Plastic refers to the second column of numbers, the first column is just a numbered list of the rows that R puts in that I didn't know how to remove):

Plastic Age Ones Zeros Nonzeros CellsCounted AllDaysAvail
1        2  10    2     5        5           10         TRUE
2       57   8    4     2        8           10         TRUE
3        3   9    2     4        6           10         TRUE
4       81   9    3     1        9           10         TRUE
5      131  20    8     1        9           10         TRUE
6        5   8    5     5        5           10         TRUE
7       26  10    4     4        6           10         TRUE
8       76  12    2     6        4           10         TRUE
9        9   9    8     2        8           10         TRUE
10      36  14    2     5        5           10         TRUE
11      64  12    3     4        6           10         TRUE
12      74  22    5     4        6           10         TRUE
13      10  10    1     4        6           10         TRUE
14      21   9    7     3        7           10         TRUE
15      16   9    5     3        7           10         TRUE
17      18   8    4     3        7           10         TRUE
18      23  22    6     4        6           10         TRUE
19     106  11    2     1        9           10         TRUE
20     113   9    1     4        6           10         TRUE
21      24  11    2     5        5           10         TRUE
22      29   9    3     2        8           10         TRUE
23      85   9    6     4        6           10         TRUE
24     403  19    1     6        4           10         TRUE
25      25  19    1     2        8           10         TRUE
26      27  10    3     3        7           10         TRUE
27     121   7    7     3        7           10         TRUE
29      35  12    1     4        6           10         TRUE
30      39  18    2     6        4           10         TRUE
31      37   8    5     1        9           10         TRUE
32      63   7    8     2        8           10         TRUE
33     122  11    3     2        8           10         TRUE
34     148   9    4     4        6           10         TRUE
37      42  13    2     3        7           10         TRUE
38     144  12    0     9        1           10         TRUE
39      43  12    1     2        8           10         TRUE
40      47  20    6     4        6           10         TRUE
41      90  12    2     5        5           10         TRUE
42     119  12    2     4        6           10         TRUE
43     138   7    7     3        7           10         TRUE
44      56   4    7     3        7           10         TRUE
45      58  12    2     5        5           10         TRUE
46      60  22    3     4        6           10         TRUE
47      71   9    2     5        5           10         TRUE
48     288  18    0    10        0           10         TRUE
49      66  22    1     5        5           10         TRUE
50      67   9    0     8        2           10         TRUE
51     149  12    0     5        5           10         TRUE
52      70  14    5     4        6           10         TRUE
53      72  12    1     4        6           10         TRUE
54      78  12    0     4        6           10         TRUE
59      79  12    4     3        7           10         TRUE
60      83  11    4     4        6           10         TRUE
61      87   8    6     4        6           10         TRUE
63      92  11    1     4        6           10         TRUE
64      96   8    0     5        5           10         TRUE
65     125   7    7     3        7           10         TRUE
66      98   9    3     4        6           10         TRUE
67     107   6    2     3        7           10         TRUE
68     102  11    5     3        7           10         TRUE
69     103  10    0     1        9           10         TRUE
72     108  12    3     3        7           10         TRUE
73     153  12    4     3        7           10         TRUE
74     109  12    3     4        6           10         TRUE
75     118  10    4     5        5           10         TRUE
77     133  12    0     4        6           10         TRUE
79     157   8    0    10        0           10         TRUE
81     318  14    2     5        5           10         TRUE

I have this code:

new_data <- fulldays %>%
              group_by(Age) %>%
              summarize(OnesMean=mean(Ones), ZerosMean=mean(Zeros), NonZeroMean=mean(Nonzeros))

This is the output "new_data" (again, Age starts on the second column, not the first):

Age OnesMean ZerosMean NonZeroMean
   <int>    <dbl>     <dbl>       <dbl>
 1     4     7         3           7   
 2     6     2         3           7   
 3     7     7.25      2.75        7.25
 4     8     3.43      4.29        5.71
 5     9     3.67      3.67        6.33
 6    10     2.33      3.67        6.33
 7    11     2.83      3.17        6.83
 8    12     1.75      4.31        5.69
 9    13     2         3           7   
10    14     3         4.67        5.33
11    18     1         8           2   
12    19     1         4           6   
13    20     7         2.5         7.5 
14    22     3.75      4.25        5.75

I have three questions:

  1. Why is groupby creating a tibble and not a dataframe?
  2. Why when I click on "new_data" as an object in the "Data" section does it display the values with so many decimal places (see image below)? [1]: https://i.sstatic.net/724G9.png
  3. How can I add/bind a column to new_data that shows the sample size for each age? In other words, I want to know how many of each individual is contributing to the mean score for each column (ideally I would add this column between "Age" and "OnesMean")

Thank you so much, and please let me know if there are any questions!


Solution

    1. tidyverse works with tibbles, which are still data.frames with a couple of differences. So when you use dplyr functions on a data.frame, it will become a tibble data.frame.
    2. The mean function doesn't round the result. If you want them to be rounded, you need to ask R to do that, with OnesMean = round(mean(Ones), 2), for example.
    3. You add n() as one of the arguments of summarise(). That is, summarise(OnesMean = mean(Ones), <...>, n()).