I'm stuck on how to develop a dplyr
solution of something that I regularly do in plyr
.
Here's the 'plyr' example:
# load packages
if(!require("pacman"))install.packages("pacman")
p_load(dplyr, plyr, Hmisc, tidyverse)
# generate data
df_samp <- tibble(
x_var = rnorm(100, 0, 1),
levels = rep(c('a', 'b', 'c', 'd'), 25))
# working plyr solution that groups data by level and calculates quantiles within levels
plyr_solution <- plyr::ddply(df_samp,~ levels,
summarise, X = wtd.Ecdf(x_var)$x,
Y = wtd.Ecdf(x_var)$ecdf)
plyr_solution
# dplyr attempt
dplyr_solution <- df_samp %>%
dplyr::select(levels, x_var) %>%
dplyr::group_by(levels) %>%
dplyr::mutate(
X = Hmisc::wtd.Ecdf(x_var)$x,
Y = Hmisc::wtd.Ecdf(x_var)$ecdf
)
Appreciate any advice on how to debug the current 'dplyr' attempt or another approach entirely that uses dplyr
.
How about this (also requires tidyr
though)
dplyr_solution <- df_samp %>%
dplyr::select(levels, x_var) %>%
dplyr::group_by(levels) %>%
dplyr::do( X = wtd.Ecdf(.$x_var)$x,
Y = wtd.Ecdf(.$x_var)$ecdf) %>%
tidyr::unnest()
You can't use mutate as, as it says in the ?mutate
, mutate "preserves the number of rows of the input" but you need to change the number of rows
Edit:
Just thought about it a bit more, you don't need tidyr::unnest
if you do this:
dplyr_solution2 <- df_samp %>%
dplyr::select(levels, x_var) %>%
dplyr::group_by(levels) %>%
dplyr::do( data.frame(X = wtd.Ecdf(.$x_var)$x,
Y = wtd.Ecdf(.$x_var)$ecdf))
Edit no 2:
You are write, dplyr::do
is mostly depreciated, I was going to suggest a purrr solution but you had specifically requested dplyr. I always assumed group_map
was part of purrr (I guess I discovered them at the same time).
You can essentially just sub out do
for group_map
with very minor change in syntax:
dplyr_solution3 <- df_samp %>%
dplyr::select(levels, x_var) %>%group_by(levels) %>%
dplyr::group_map(~data.frame(X = wtd.Ecdf(.$x_var)$x,
Y = wtd.Ecdf(.$x_var)$ecdf))
Or you can swap to purrr::map_dfr
purrr_solution <- df_samp %>%
dplyr::select(levels, x_var) %>%
split(.$levels) %>%
purrr::map_dfr(~data.frame(X = wtd.Ecdf(.$x_var)$x,
Y = wtd.Ecdf(.$x_var)$ecdf), .id = "levels")