Search code examples
rdplyrtidyverse

Create a new calculated column for a dataframe with multiple values per row of the original dataframe


Sometimes I want to calculate multiple values based on one column of each row of a dataframe. In the tidyverse, dplyr now provides reframe as a counterpart to mutate for this, spitting out a longer dataframe, but it does not output the values of the other columns of the original dataframe so one loses the correspondence with the original dataframe.

How should the original dataframe be combined with the new dataframe?

As an example, say we want to produce a report of whether each character from Star Wars is light enough to box in the Olympics in various weight classes.

    boxingPopularWeightClasses <- c(57,80,92,9999) #kilograms
    boxingWeightClassName<-c("Featherweight","Middleweight","Heavyweight","Super Heavyweight")

    checkAllowedToCompeteInEachWeightClass <- function(mass) {
      mass < boxingPopularWeightClasses
    }

    library(dplyr)

    allowed <- starwars %>% rowwise() |>
      reframe( class = boxingWeightClassName,
               criterion = boxingPopularWeightClasses, 
               allowed = checkAllowedToCompeteInEachWeightClass(mass)
      ) 

head(allowed)

The above calculates eligibility (allowed) for each character, spitting out the following, where the other columns such as the character's name have been lost.

# A tibble: 6 × 3
  class             criterion allowed
  <chr>                 <dbl> <lgl>  
1 Featherweight            57 FALSE  
2 Middleweight             80 TRUE   
3 Heavyweight              92 TRUE   
4 Super Heavyweight      9999 TRUE   
5 Featherweight            57 FALSE  
6 Middleweight             80 TRUE  

How can we recombine this with the original dataframe?


Solution

  • I think you just want a cross_join - which will join each row of dataset x to each of y, after which you can do your allowed calculation:

    out <- cross_join(starwars, data.frame(boxingPopularWeightClasses, boxingWeightClassName)) %>%
        mutate(allowed = mass < boxingPopularWeightClasses)
    head(out[c(1:3,15:17)])
    ## A tibble: 6 × 6
    #  name         height  mass boxingPopularWeightC…¹ boxingWeightClassName allowed
    #  <chr>         <int> <dbl>                  <dbl> <chr>                 <lgl>  
    #1 Luke Skywal…    172    77                     57 Featherweight         FALSE  
    #2 Luke Skywal…    172    77                     80 Middleweight          TRUE   
    #3 Luke Skywal…    172    77                     92 Heavyweight           TRUE   
    #4 Luke Skywal…    172    77                   9999 Super Heavyweight     TRUE   
    #5 C-3PO           167    75                     57 Featherweight         FALSE  
    #6 C-3PO           167    75                     80 Middleweight          TRUE   
    ## ℹ abbreviated name: ¹​boxingPopularWeightClasses