I have this piece of code that runs a glm, which is used to generate the "midpoint" of a subject's responses [coded as trochiac/iambic, 0 or 1] to a list of numeric stimuli, saves the midpoint as a value and prints the value in the console.
glm.1 <- glm(coderesponse~stimulus, family = binomial(link="logit"), data=data)
midpoint <- -glm.1$coefficients[1]/glm.1$coefficients[2]
cat(sprintf("file : %s\nmidpoint : %.2f",datafile,midpoint))
At the moment, this code runs over the entire dataframe. I was wondering how to modify this code so that I could run it over various subgroups within my main dataframe and create a new column with those values for each subgroup?
e.g. for each subject, I would like to generate the midpoint value for each block (1-8) within each stimtype "bd", "nm" and "nm". That midpoint value would be the new value in the newly created column for all the rows for each block within each stimtype.
We also eventually want to aggregate the values of each block to be reduced to one row containing the midpoint value (rather than keeping all of the rows with the same value).
a small dummy version of my main dataframe (only includes one subject and stimuli up to 6):
subject <- c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2)
stimulus <- c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 1, 1, 1, 1, 1, 1)
block <- c(3, 3, 3, 7, 7, 7, 4, 4, 4, 8, 8, 8, 1, 1, 1, 5, 5, 5, 2, 2, 2, 6, 6, 6, 3, 3, 3, 7, 7, 7, 4, 4, 4, 8, 8, 8, 2, 2, 2, 6, 6, 6)
blockprocedure <- c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1)
stimtype <- c('bd', 'nd', 'nm', 'bd', 'nd', 'nm', 'bd', 'nd', 'nm', 'bd', 'nd', 'nm', 'bd', 'nd', 'nm', 'bd', 'nd', 'nm', 'bd', 'nd', 'nm', 'bd', 'nd', 'nm', 'bd', 'nd', 'nm', 'bd', 'nd', 'nm', 'bd', 'nd', 'nm', 'bd', 'nd', 'nm', 'bd', 'nd', 'nm', 'bd', 'nd', 'nm')
blocktype <- c('mouth', 'mouth', 'mouth', 'nose', 'nose', 'nose', 'mouth', 'mouth', 'mouth', 'nose', 'nose', 'nose', 'mouth', 'mouth', 'mouth', 'nose', 'nose', 'nose', 'mouth', 'mouth', 'mouth', 'nose', 'nose', 'nose', 'mouth', 'mouth', 'mouth', 'nose', 'nose', 'nose', 'mouth', 'mouth', 'mouth', 'nose', 'nose', 'nose', 'mouth', 'mouth', 'mouth', 'nose', 'nose', 'nose')
coderesponse <- c(1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1)
dummy = data.frame(subject, stimulus, block, stimtype, blockprocedure, blocktype, coderesponse)
I initially tried, but obviously it's not the way to go...:
dummy <- data %>%
group_by(subject, stimtype, block)
dummy$test <- NA
glm.1 <- glm(coderesponse~stimulus, family = binomial(link="logit"), data=dummy)
midpoint <- -glm.1$coefficients[1]/glm.1$coefficients[2]
dummy$test <- midpoint
I'm quite new to coding, so I hope this all makes sense! Thank you for any help/insight!
I think this is a good place to use the combiation of tidyr::nest
and purrr::map
.
Indeed, as ?nest
says, "Nesting is often useful for creating per group models".
Here is some code:
library(dplyr)
library(tidyr)
library(purrr)
get_midpoint = function(data){
glm.1 = glm(coderesponse~stimulus, family = binomial(link="logit"), data=data)
rtn = -glm.1$coefficients[1]/glm.1$coefficients[2]
rtn
}
dummy %>%
nest(data=-c(subject, stimtype, block)) %>%
mutate(midpoint=map_dbl(data, get_midpoint))
# A tibble: 30 x 5 subject block stimtype data midpoint <dbl> <dbl> <fct> <list> <dbl> 1 1 3 bd <tibble [2 x 4]> -1.69e11 2 1 3 nd <tibble [2 x 4]> -1.69e11 3 1 3 nm <tibble [2 x 4]> -1.69e11 4 1 7 bd <tibble [2 x 4]> 3.00e 0 5 1 7 nd <tibble [2 x 4]> -1.69e11 6 1 7 nm <tibble [2 x 4]> -1.69e11 7 1 4 bd <tibble [2 x 4]> 4.00e 0 8 1 4 nd <tibble [2 x 4]> 4.00e 0 9 1 4 nm <tibble [2 x 4]> -1.96e11 10 1 8 bd <tibble [2 x 4]> 4.00e 0
Here, you can nest
all columns but c(subject, stimtype, block)
in a column named data
. Then you can map
around this column to apply a custom function. As your function returns a double, I used map_dbl
.
You could also use summarise:
dummy %>%
group_by(subject, stimtype, block) %>%
summarise(midpoint = get_midpoint(tibble(coderesponse, stimulus)))
This outputs the same result (in different order though).