Search code examples
rdataframemutate

I want to use a previously created function in the mutate() function. Yet R doesn't seem to want to let me


I am looking at population data and want to make sure I have enough observations do to county level analysis. Therefore I would like to generate a variable that assigns each observation the number of observations with the same value for the "county" row.

I want to assign each row in my data frame ("cps") a new variable ("freq") which represents the frequency of its specific value in one specific variable ("county"). I used

f <- function(x)sum(with(cps, county==x))

to generate a function that tells me how often a given county x appears in the data. Now I want to use

cps <- mutate(cps, freq=f(county))

to assign each row the number of times its county value appears in the data frame. However, it assigns each row with the overall number of observations.


Solution

  • You can get what you want using dplyr::add_count():

    library(dplyr)
    mpg %>% add_count(cyl, name = "freq")
    
    # A tibble: 234 × 12
       manufacturer model      displ  year   cyl trans      drv     cty   hwy fl    class    freq
       <chr>        <chr>      <dbl> <int> <int> <chr>      <chr> <int> <int> <chr> <chr>   <int>
     1 audi         a4           1.8  1999     4 auto(l5)   f        18    29 p     compact    81
     2 audi         a4           1.8  1999     4 manual(m5) f        21    29 p     compact    81
     3 audi         a4           2    2008     4 manual(m6) f        20    31 p     compact    81
     4 audi         a4           2    2008     4 auto(av)   f        21    30 p     compact    81
     5 audi         a4           2.8  1999     6 auto(l5)   f        16    26 p     compact    79
     6 audi         a4           2.8  1999     6 manual(m5) f        18    26 p     compact    79
     7 audi         a4           3.1  2008     6 auto(av)   f        18    27 p     compact    79
     8 audi         a4 quattro   1.8  1999     4 manual(m5) 4        18    26 p     compact    81
     9 audi         a4 quattro   1.8  1999     4 auto(l5)   4        16    25 p     compact    81
    10 audi         a4 quattro   2    2008     4 manual(m6) 4        20    28 p     compact    81
    # … with 224 more rows
    

    But if you wanted to use your function, you'd need to wrap in sapply() (or purrr:map_int()) to compare each element of x against every element:

    f <- function(x) sapply(x, \(x) sum(with(mpg, cyl == x)))
    

    You can also generalize it to work with any column:

    f2 <- function(x) sapply(x, \(x_i) sum(x == x_i))
    
    mutate(mpg, freq=f2(drv))
    
    # A tibble: 234 × 12
       manufacturer model      displ  year   cyl trans      drv     cty   hwy fl    class    freq
       <chr>        <chr>      <dbl> <int> <int> <chr>      <chr> <int> <int> <chr> <chr>   <int>
     1 audi         a4           1.8  1999     4 auto(l5)   f        18    29 p     compact   106
     2 audi         a4           1.8  1999     4 manual(m5) f        21    29 p     compact   106
     3 audi         a4           2    2008     4 manual(m6) f        20    31 p     compact   106
     4 audi         a4           2    2008     4 auto(av)   f        21    30 p     compact   106
     5 audi         a4           2.8  1999     6 auto(l5)   f        16    26 p     compact   106
     6 audi         a4           2.8  1999     6 manual(m5) f        18    26 p     compact   106
     7 audi         a4           3.1  2008     6 auto(av)   f        18    27 p     compact   106
     8 audi         a4 quattro   1.8  1999     4 manual(m5) 4        18    26 p     compact   103
     9 audi         a4 quattro   1.8  1999     4 auto(l5)   4        16    25 p     compact   103
    10 audi         a4 quattro   2    2008     4 manual(m6) 4        20    28 p     compact   103
    # … with 224 more rows