Search code examples
rdplyr

dplyr summarise returns multiple rows per group?


I suspect I'm doing something stupid here, but I'm lost. In the interest of reproducing my exact issue, there's a very large dput() here...apologies in advance.

Data:

hofs <- structure(list(Name = c("Hank Aaron HOF", "Roberto Alomar HOF", 
"Cap Anson HOF", "Luis Aparicio HOF", "Luke Appling HOF", "Richie Ashburn HOF", 
"Earl Averill HOF", "Home Run Baker HOF", "Dave Bancroft HOF", 
"Ernie Banks HOF", "Jake Beckley HOF", "Johnny Bench HOF", "Yogi Berra HOF", 
"Wade Boggs HOF", "Jim Bottomley HOF", "Lou Boudreau HOF", "Roger Bresnahan HOF", 
"George Brett HOF", "Lou Brock HOF", "Dan Brouthers HOF", "Willard Brown HOF", 
"Jesse Burkett HOF", "Roy Campanella HOF", "Rod Carew HOF", "Max Carey HOF", 
"Gary Carter HOF", "Orlando Cepeda HOF", "Frank Chance HOF", 
"Fred Clarke HOF", "Roberto Clemente HOF", "Ty Cobb HOF", "Mickey Cochrane HOF", 
"Eddie Collins HOF", "Jimmy Collins HOF", "Earle Combs HOF", 
"Roger Connor HOF", "Sam Crawford HOF", "Joe Cronin HOF", "Kiki Cuyler HOF", 
"George Davis HOF", "Andre Dawson HOF", "Ed Delahanty HOF", "Bill Dickey HOF", 
"Joe DiMaggio HOF", "Larry Doby HOF", "Bobby Doerr HOF", "Hugh Duffy HOF", 
"Johnny Evers HOF", "Buck Ewing HOF", "Rick Ferrell HOF", "Carlton Fisk HOF", 
"Elmer Flick HOF", "Nellie Fox HOF", "Jimmie Foxx HOF", "Frankie Frisch HOF", 
"Lou Gehrig HOF", "Charlie Gehringer HOF", "Joe Gordon HOF", 
"Goose Goslin HOF", "Hank Greenberg HOF", "Tony Gwynn HOF", "Chick Hafey HOF", 
"Billy Hamilton HOF", "Gabby Hartnett HOF", "Harry Heilmann HOF", 
"Rickey Henderson HOF", "Billy Herman HOF", "Harry Hooper HOF", 
"Rogers Hornsby HOF", "Monte Irvin HOF", "Reggie Jackson HOF", 
"Travis Jackson HOF", "Hughie Jennings HOF", "Al Kaline HOF", 
"Willie Keeler HOF", "George Kell HOF", "Joe Kelley HOF", "High Pockets Kelly HOF", 
"King Kelly HOF", "Harmon Killebrew HOF", "Ralph Kiner HOF", 
"Chuck Klein HOF", "Nap Lajoie HOF", "Barry Larkin HOF", "Tony Lazzeri HOF", 
"Freddie Lindstrom HOF", "Ernie Lombardi HOF", "Mickey Mantle HOF", 
"Heinie Manush HOF", "Rabbit Maranville HOF", "Eddie Mathews HOF", 
"Willie Mays HOF", "Bill Mazeroski HOF", "Tommy McCarthy HOF", 
"Willie McCovey HOF", "Bid McPhee HOF", "Joe Medwick HOF", "Johnny Mize HOF", 
"Paul Molitor HOF", "Joe Morgan HOF", "Eddie Murray HOF", "Stan Musial HOF", 
"Jim O'Rourke HOF", "Mel Ott HOF", "Tony Perez HOF", "Kirby Puckett HOF", 
"Pee Wee Reese HOF", "Jim Rice HOF", "Sam Rice HOF", "Cal Ripken HOF", 
"Phil Rizzuto HOF", "Brooks Robinson HOF", "Frank Robinson HOF", 
"Jackie Robinson HOF", "Edd Roush HOF", "Babe Ruth HOF", "Ryne Sandberg HOF", 
"Ron Santo HOF", "Ray Schalk HOF", "Mike Schmidt HOF", "Red Schoendienst HOF", 
"Joe Sewell HOF", "Al Simmons HOF", "George Sisler HOF", "Enos Slaughter HOF", 
"Ozzie Smith HOF", "Duke Snider HOF", "Tris Speaker HOF", "Willie Stargell HOF", 
"Bill Terry HOF", "Sam Thompson HOF", "Joe Tinker HOF", "Pie Traynor HOF", 
"Arky Vaughan HOF", "Honus Wagner HOF", "Bobby Wallace HOF", 
"Lloyd Waner HOF", "Paul Waner HOF", "Monte Ward HOF", "Zack Wheat HOF", 
"Billy Williams HOF", "Ted Williams HOF", "Hack Wilson HOF", 
"Dave Winfield HOF", "Carl Yastrzemski HOF", "Ross Youngs HOF", 
"Robin Yount HOF"), era = c("Expansion", "Long Ball", "19th Century", 
"Expansion", "Lively Ball", "Integration", "Lively Ball", "Dead Ball", 
"Lively Ball", "Expansion", "19th Century", "Expansion", "Integration", 
"Free Agency", "Lively Ball", "Integration", "Dead Ball", "Free Agency", 
"Expansion", "19th Century", "Integration", "19th Century", "Integration", 
"Expansion", "Lively Ball", "Free Agency", "Expansion", "Dead Ball", 
"Dead Ball", "Expansion", "Dead Ball", "Lively Ball", "Dead Ball", 
"Dead Ball", "Lively Ball", "19th Century", "Dead Ball", "Lively Ball", 
"Lively Ball", "19th Century", "Free Agency", "19th Century", 
"Lively Ball", "Integration", "Integration", "Integration", "19th Century", 
"Dead Ball", "19th Century", "Lively Ball", "Free Agency", "Dead Ball", 
"Integration", "Lively Ball", "Lively Ball", "Lively Ball", "Lively Ball", 
"Integration", "Lively Ball", "Lively Ball", "Free Agency", "Lively Ball", 
"19th Century", "Lively Ball", "Lively Ball", "Free Agency", 
"Lively Ball", "Dead Ball", "Lively Ball", "Integration", "Free Agency", 
"Lively Ball", "Dead Ball", "Expansion", "Dead Ball", "Integration", 
"19th Century", "Lively Ball", "19th Century", "Expansion", "Integration", 
"Lively Ball", "Dead Ball", "Long Ball", "Lively Ball", "Lively Ball", 
"Lively Ball", "Integration", "Lively Ball", "Lively Ball", "Integration", 
"Expansion", "Expansion", "19th Century", "Expansion", "19th Century", 
"Lively Ball", "Integration", "Free Agency", "Expansion", "Free Agency", 
"Integration", "19th Century", "Lively Ball", "Expansion", "Free Agency", 
"Integration", "Free Agency", "Lively Ball", "Free Agency", "Integration", 
"Expansion", "Expansion", "Integration", "Lively Ball", "Lively Ball", 
"Free Agency", "Expansion", "Lively Ball", "Free Agency", "Integration", 
"Lively Ball", "Lively Ball", "Lively Ball", "Integration", "Free Agency", 
"Integration", "Dead Ball", "Expansion", "Lively Ball", "19th Century", 
"Dead Ball", "Lively Ball", "Lively Ball", "Dead Ball", "Dead Ball", 
"Lively Ball", "Lively Ball", "19th Century", "Dead Ball", "Expansion", 
"Integration", "Lively Ball", "Free Agency", "Expansion", "Lively Ball", 
"Free Agency"), HRR = c(0.0610643804593983, 0.0231455968257467, 
0.00943487987549849, 0.00811339198435973, 0.00508130081300813, 
0.00346682606096832, 0.0374626160868881, 0.0160427807486631, 
0.00445558340295182, 0.0543466723277784, 0.00912140910044034, 
0.0507965526247062, 0.0473858371939113, 0.0128540305010893, 0.0293133449337438, 
0.0112788190413004, 0.0058022762776166, 0.0306309788385351, 0.0144212156407278, 
0.0157949634927731, 0.0149253731343284, 0.00890102065036791, 
0.0575505350772889, 0.00987654320987654, 0.00747623624906547, 
0.0406473466315393, 0.0478112779109373, 0.00465224470807164, 
0.00780521901211556, 0.0253860799661519, 0.0102326394962393, 
0.0230218610949894, 0.00472409287365564, 0.00956585724797645, 
0.0100939784197703, 0.0176991150442478, 0.0101358411703239, 0.0224303997888904, 
0.0178745985197598, 0.00807075732448867, 0.0441220912662436, 
0.0134469444814272, 0.0320634920634921, 0.052924791086351, 0.0473074046372476, 
0.0314394473424503, 0.0150482680295287, 0.00195535277823041, 
0.0132388588476599, 0.0046449900464499, 0.0429419826404751, 0.00857602286939432, 
0.00379116117850953, 0.0656503565281534, 0.011523266022827, 0.0616172978377703, 
0.0207674943566591, 0.0443315226914316, 0.0286506469500924, 0.0637396495282111, 
0.0145348837209302, 0.0354594594594595, 0.0063663854846411, 0.0366915422885572, 
0.0235007063053808, 0.0270960678770185, 0.00609835214739847, 
0.00853727945361412, 0.0368285819160651, 0.0396158463385354, 
0.0570762368207624, 0.0221820571804141, 0.00367722165474974, 
0.0394424673784104, 0.00384122919334187, 0.0116383169203223, 
0.00927776191835569, 0.024695478057734, 0.0117068204954191, 0.0703326377807782, 
0.0708933717579251, 0.0462534690101758, 0.00855146522056523, 
0.0249464533198942, 0.0282674289344132, 0.0183567991445375, 0.0324508966695132, 
0.0661565045667736, 0.0143715704206951, 0.0027783290335384, 0.0599742298231229, 
0.0606561896884478, 0.0177949709864603, 0.00859375, 0.0635598389654752, 
0.00638246628131021, 0.0268500327439424, 0.0557193853794816, 
0.0215966774342409, 0.0288886493478495, 0.0444601270289344, 0.0432920160408312, 
0.00729154416088439, 0.0540397631133672, 0.0387604827163019, 
0.0285753727222529, 0.0156366344005957, 0.0464437689969605, 0.00366814111554645, 
0.0373127867717081, 0.00653370013755158, 0.0251548714097991, 
0.05856486108335, 0.0280910395735083, 0.00923536601928562, 0.085010120252411, 
0.0336314847942755, 0.0419992631708216, 0.00207312476441764, 
0.0656130268199234, 0.00990682863545229, 0.00687044307347168, 
0.0350496632035621, 0.0123382121688642, 0.0212685627988925, 0.00297999148573861, 
0.0568356374807988, 0.011476213830309, 0.0599217862999874, 0.0239576851275669, 
0.0210070023341114, 0.0048129172488744, 0.00767297261542532, 
0.0144971307762005, 0.00967525625059872, 0.00394523091204456, 
0.0034740092640247, 0.011946294534306, 0.00339602925809822, 0.0144959367450033, 
0.0455614973262032, 0.0676096548144303, 0.0512605042016807, 0.0422612014905026, 
0.0377043710377044, 0.00907715582450832, 0.0228015988372093)), class = "data.frame", row.names = c(NA, 
-147L))
    

Code:

    hrrSummary <- hofs %>% group_by(era) %>% summarise(minHRR = min(HRR),
                                                       Q1HRR = quantile(HRR, 0.25),
                                                       MedHRR = median(HRR),
                                                       Q3HRR = quantile(HRR,0.75),
                                                       maxHRR = max(HRR),
                                                       meanHRR = mean(HRR),
                                                       rngHRR = range(HRR),
                                                       stdHRR = sd(HRR)) 

The goal here is just to generate some summary stats on the HRR column in the data, grouped by era.

I get the following warning

Returning more (or less) than 1 row per summarise() group was deprecated in dplyr 1.1.0. ℹ Please use reframe() instead.

And the resulting hrrSummary dataframe has 14 rows, even though there are only seven eras...each era appears to be duplicated.

hrrSummary
# A tibble: 14 × 9
# Groups:   era [7]
   era           minHRR   Q1HRR  MedHRR   Q3HRR maxHRR meanHRR  rngHRR  stdHRR
   <chr>          <dbl>   <dbl>   <dbl>   <dbl>  <dbl>   <dbl>   <dbl>   <dbl>
 1 19th Century 0.00340 0.00807 0.00928 0.0134  0.0210 0.0109  0.00340 0.00459
 2 19th Century 0.00340 0.00807 0.00928 0.0134  0.0210 0.0109  0.0210  0.00459
 3 Dead Ball    0.00196 0.00469 0.00854 0.00991 0.0160 0.00782 0.00196 0.00380
 4 Dead Ball    0.00196 0.00469 0.00854 0.00991 0.0160 0.00782 0.0160  0.00380
 5 Expansion    0.00811 0.0254  0.0420  0.0586  0.0703 0.0410  0.00811 0.0189 
 6 Expansion    0.00811 0.0254  0.0420  0.0586  0.0703 0.0410  0.0703  0.0189 
 7 Free Agency  0.00298 0.0239  0.0355  0.0438  0.0656 0.0342  0.00298 0.0158 
 8 Free Agency  0.00298 0.0239  0.0355  0.0438  0.0656 0.0342  0.0656  0.0158 
 9 Integration  0.00347 0.0141  0.0415  0.0560  0.0709 0.0361  0.00347 0.0225 
10 Integration  0.00347 0.0141  0.0415  0.0560  0.0709 0.0361  0.0709  0.0225 
11 Lively Ball  0.00207 0.00912 0.0223  0.0344  0.0850 0.0246  0.00207 0.0194 
12 Lively Ball  0.00207 0.00912 0.0223  0.0344  0.0850 0.0246  0.0850  0.0194 
13 Long Ball    0.0231  0.0236  0.0240  0.0245  0.0249 0.0240  0.0231  0.00127
14 Long Ball    0.0231  0.0236  0.0240  0.0245  0.0249 0.0240  0.0249  0.00127

What am I doing wrong to elicit this?


Solution

  • While the warning is pretty self explanatory, the duplicated values come from using range. A solution can be to unnest_wider the listed duplicated values of range. IMO, using reframe or summarize is a matter of choice as long as both functions exist but you should be aware of the differences and the reasons why it's been deprecated and adjust accordingly.

    library(dplyr)
    library(tidyr)
    
    hofs %>% 
      group_by(era) %>% 
      summarise(minHRR = min(HRR), 
                Q1HRR = quantile(HRR, 0.25),
                MedHRR = median(HRR),
                Q3HRR = quantile(HRR,0.75),
                maxHRR = max(HRR),
                meanHRR = mean(HRR),
                rngHRR = list(range(HRR)),
                stdHRR = sd(HRR)) %>% 
      unnest_wider(rngHRR, names_sep="_")
    # A tibble: 7 × 10
      era    minHRR   Q1HRR  MedHRR   Q3HRR maxHRR meanHRR rngHRR_1 rngHRR_2  stdHRR
      <chr>   <dbl>   <dbl>   <dbl>   <dbl>  <dbl>   <dbl>    <dbl>    <dbl>   <dbl>
    1 19th… 0.00340 0.00807 0.00928 0.0134  0.0210 0.0109   0.00340   0.0210 0.00459
    2 Dead… 0.00196 0.00469 0.00854 0.00991 0.0160 0.00782  0.00196   0.0160 0.00380
    3 Expa… 0.00811 0.0254  0.0420  0.0586  0.0703 0.0410   0.00811   0.0703 0.0189 
    4 Free… 0.00298 0.0239  0.0355  0.0438  0.0656 0.0342   0.00298   0.0656 0.0158 
    5 Inte… 0.00347 0.0141  0.0415  0.0560  0.0709 0.0361   0.00347   0.0709 0.0225 
    6 Live… 0.00207 0.00912 0.0223  0.0344  0.0850 0.0246   0.00207   0.0850 0.0194 
    7 Long… 0.0231  0.0236  0.0240  0.0245  0.0249 0.0240   0.0231    0.0249 0.00127