In my R shiny App, I want to allow the user, by means of check boxes, to choose columns (instrument parameters in data), and for each of those which functions to apply (i.e. which statistics to apply):
Calculate for columns x,y,z
different functions passed as text strings, selected for that column,
do this per group (factor column with names, called 'category' here.) and
name the resulting columns (name + function-name) in the output df
Selected checkboxes will produce a named list, where the name is the column (parameter name in df), and each named list element contains a vector of text string names of the functions to apply
My list of function requests would look like this:
functionlist <- list(c1 = c('mean', 'sum'),
c2 = 'Length',
c3 = c('Min', 'Max'),
c4 = c('mean', 'sd', 'sum'))
Therefor I'm looking for a fast was to apply different sets of functions to different columns in 1 go and get the results in the form of a dataframe
The data to apply it to would be comparable to this dummy data:
library(data.table)
n = 100000
dt = data.table(index=1:100000,
category = sample(letters[1:25], n, replace = T),
c1=rnorm(n,10000),
c2=rnorm(n,1000),
c3=rnorm(n,100),
c4 = rnorm(n,10)
)
p.s. it is a follow up of this question: SO question, but now with the extra complication of applying different functions to different columns
UPDATE it would be even nicer if the user can also choose which groups to calculate for.
First I convert to lowercase, because some of these functions as written don't exist (at least in base R). Then I set the names so the output columns will have names. Next, I lapply
over the vector of function names and apply them to get(.y)
, which is the name of the current element in functionlist
, i.e. the column name.
Some explanations for the less standard stuff here:
imap
from tidyverse (purrr specifically) is like lapply
, but instead of writing function(x)
and referring to the element in your function body as x
, you can just write ~
and use .x
to refer to the element in the list and .y
for the name of that element. imap_dfc
is like imap
, but it cbinds
all the results together as on data frame.get(x)
will search for an object whose name is the character string assigned to x
and return that object. So if f
and x
are assigned as f <- 'mean'
and x = 'c1'
, then get(f)(get(x))
is equivalent to mean(c1)
.library(tidyverse)
imap_dfc(functionlist, ~{
.x <- tolower(.x)
.x <- setNames(.x, paste0(.y, '_', .x))
dt[, lapply(.x, function(f) get(f)(get(.y)))
, by = category][,-'category']
})[, category := unique(dt$category)]
# c1_mean c1_sum c2_length c3_min c3_max c4_mean c4_sd c4_sum category
# 1: 9999.988 39689953 3969 96.32998 103.3013 9.999057 1.0047397 39686.26 i
# 2: 9999.992 40749969 4075 96.45056 103.4090 9.990428 1.0018953 40710.99 e
# 3: 9999.980 39769919 3977 96.93850 103.6276 10.014546 1.0231273 39827.85 p
# 4: 9999.989 40379955 4038 96.04255 103.6632 9.999367 0.9892175 40377.45 y
# 5: 10000.008 39550031 3955 96.25407 103.7432 10.003377 1.0169810 39563.36 d
# 6: 9999.997 38809990 3881 96.29265 104.3206 9.993976 1.0026220 38786.62 q
# 7: 10000.021 39910082 3991 96.50937 103.4453 9.989248 0.9877364 39867.09 o
# 8: 10000.028 39860113 3986 96.18319 103.8271 9.996190 0.9688054 39844.81 w
# 9: 10000.006 39860025 3986 96.89095 103.8927 10.004848 1.0097102 39879.32 n
# 10: 10000.018 39700073 3970 96.36530 103.4446 10.012726 1.0140592 39750.52 s
# 11: 9999.997 39939988 3994 96.40691 103.7046 9.983768 1.0036815 39875.17 u
# 12: 9999.985 39809941 3981 96.22220 104.0094 10.026315 1.0019427 39914.76 g
# 13: 10000.013 40330054 4033 96.29462 103.8641 9.983023 0.9943337 40261.53 x
# 14: 10000.003 39060012 3906 96.42303 103.7972 9.997054 0.9981013 39048.49 j
# 15: 10000.002 39640008 3964 96.30402 103.8221 10.003373 1.0198045 39653.37 h
# 16: 10000.003 40860011 4086 96.54114 103.4493 9.978961 0.9921391 40774.04 l
# 17: 9999.978 40829908 4083 96.18487 103.7403 9.997847 1.0126861 40821.21 f
# 18: 9999.964 39249859 3925 96.22323 103.6110 9.994997 0.9965463 39230.36 m
# 19: 9999.964 39959858 3996 96.37931 103.4358 10.020087 1.0149939 40040.27 v
# 20: 10000.014 40760057 4076 96.22407 104.0107 10.015623 0.9981900 40823.68 b
# 21: 9999.998 40019993 4002 96.77786 103.7248 10.000020 1.0099821 40020.08 k
# 22: 9999.981 41449920 4145 96.73103 103.8441 10.032406 1.0161685 41584.32 t
# 23: 10000.010 39320039 3932 95.83859 103.2523 9.970763 0.9953307 39205.04 a
# 24: 9999.984 40769934 4077 96.05744 103.3219 9.995989 1.0073559 40753.65 r
# 25: 10000.001 39720003 3972 96.51861 103.3922 9.945475 0.9816170 39503.43 c
# c1_mean c1_sum c2_length c3_min c3_max c4_mean c4_sd c4_sum category