I am looking at population data and want to make sure I have enough observations do to county level analysis. Therefore I would like to generate a variable that assigns each observation the number of observations with the same value for the "county" row.
I want to assign each row in my data frame ("cps") a new variable ("freq") which represents the frequency of its specific value in one specific variable ("county"). I used
f <- function(x)sum(with(cps, county==x))
to generate a function that tells me how often a given county x appears in the data. Now I want to use
cps <- mutate(cps, freq=f(county))
to assign each row the number of times its county value appears in the data frame. However, it assigns each row with the overall number of observations.
You can get what you want using dplyr::add_count()
:
library(dplyr)
mpg %>% add_count(cyl, name = "freq")
# A tibble: 234 × 12
manufacturer model displ year cyl trans drv cty hwy fl class freq
<chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr> <int>
1 audi a4 1.8 1999 4 auto(l5) f 18 29 p compact 81
2 audi a4 1.8 1999 4 manual(m5) f 21 29 p compact 81
3 audi a4 2 2008 4 manual(m6) f 20 31 p compact 81
4 audi a4 2 2008 4 auto(av) f 21 30 p compact 81
5 audi a4 2.8 1999 6 auto(l5) f 16 26 p compact 79
6 audi a4 2.8 1999 6 manual(m5) f 18 26 p compact 79
7 audi a4 3.1 2008 6 auto(av) f 18 27 p compact 79
8 audi a4 quattro 1.8 1999 4 manual(m5) 4 18 26 p compact 81
9 audi a4 quattro 1.8 1999 4 auto(l5) 4 16 25 p compact 81
10 audi a4 quattro 2 2008 4 manual(m6) 4 20 28 p compact 81
# … with 224 more rows
But if you wanted to use your function, you'd need to wrap in sapply()
(or purrr:map_int()
) to compare each element of x
against every element:
f <- function(x) sapply(x, \(x) sum(with(mpg, cyl == x)))
You can also generalize it to work with any column:
f2 <- function(x) sapply(x, \(x_i) sum(x == x_i))
mutate(mpg, freq=f2(drv))
# A tibble: 234 × 12
manufacturer model displ year cyl trans drv cty hwy fl class freq
<chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr> <int>
1 audi a4 1.8 1999 4 auto(l5) f 18 29 p compact 106
2 audi a4 1.8 1999 4 manual(m5) f 21 29 p compact 106
3 audi a4 2 2008 4 manual(m6) f 20 31 p compact 106
4 audi a4 2 2008 4 auto(av) f 21 30 p compact 106
5 audi a4 2.8 1999 6 auto(l5) f 16 26 p compact 106
6 audi a4 2.8 1999 6 manual(m5) f 18 26 p compact 106
7 audi a4 3.1 2008 6 auto(av) f 18 27 p compact 106
8 audi a4 quattro 1.8 1999 4 manual(m5) 4 18 26 p compact 103
9 audi a4 quattro 1.8 1999 4 auto(l5) 4 16 25 p compact 103
10 audi a4 quattro 2 2008 4 manual(m6) 4 20 28 p compact 103
# … with 224 more rows