There are problems that inherently require several layers of nesting to be solved. In a current project, I frequently find myself using three nested apply
s in order to do something with the elements contained in the deepest layer of a nested list structure.
R's list handling and apply
-family allows for quite concise code for this type of problems, but writing it still gives me headaches and I'm pretty sure that anyone else reading it will have to take several minutes to understand what I am doing. This is although I'm basically doing nothing complicated - if it weren't for the multiple layers of lists to be traversed.
Below I provide a snippet of code that I wrote that shall serve as an example. I would consider it concise, yet hard to read.
Context: I am writing a simulation of surface electromyographic data, i.e. changes in the electrical potential that can be measured on the human skin and which is invoked by muscular activity. For this, I consider several muscles (first layer of lists) which each consist of a number of so-called motor units (second layer of lists) which each are related to a set of electrodes (third layer of lists) placed on the skin. In the example code below, we have the objects .firing.contribs
, which contains the information on how strong the action of a motor unit influences the potential at a specific electrode, and .firing.instants
which contains the information on the instants in time at which these motor units are fired. The given function then calculates the time course of the potential at each electrode.
Here's the question: What are possible options to render this type of code easily understable for the reader? Particularly: How can I make clearer what I am actually doing, i.e. performing some computation on each (MU, electrode) pair and then summing up all potential contributions to each electrode? I feel this is currently not very obvious in the code.
Notes:
Here's the example code.
sum.MU.firing.contributions <- function(.firing.contribs, .firing.instants) {
calc.MU.contribs <- function(.MU.firing.contribs, .MU.firing.instants)
lapply(.MU.firing.contribs, calc.MU.electrode.contrib,
.MU.firing.instants)
calc.muscle.contribs <- function(.MU.firing.contribs, .MU.firing.instants) {
MU.contribs <- mapply(calc.MU.contribs, .MU.firing.contribs,
.MU.firing.instants, SIMPLIFY = FALSE)
muscle.contribs <- reduce.contribs(MU.contribs)
}
muscle.contribs <- mapply(calc.muscle.contribs, .firing.contribs,
.firing.instants, SIMPLIFY = FALSE)
surface.potentials <- reduce.contribs(muscle.contribs)
}
## Takes a list (one element per object) of lists (one element per electrode)
## that contain the time course of the contributions of that object to the
## surface potential at that electrode (as numerical vectors).
## Returns a list (one element per electrode) containing the time course of the
## contributions of this list of objects to the surface potential at all
## electrodes (as numerical vectors again).
reduce.contribs <- function(obj.list) {
contribs.by.electrode <- lapply(seq_along(obj.list[[1]]), function(i)
sapply(obj.list, `[[`, i))
contribs <- lapply(contribs.by.electrode, rowSums)
}
calc.MU.electrode.contrib <- function(.MU.firing.contrib, .MU.firing.instants) {
## This will in reality be more complicated since then .MU.firing.contrib
## will have a different (more complicated) structure.
.MU.firing.contrib * .MU.firing.instants
}
firing.contribs <- list(list(list(1,2),list(3,4)),
list(list(5,6),list(7,8),list(9,10)))
firing.instants <- list(list(c(0,0,1,0), c(0,1,0,0)),
list(c(0,0,0,0), c(1,0,1,0), c(0,1,1,0)))
surface.potentials <- sum.MU.firing.contributions(firing.contribs, firing.instants)
As suggested by user @alexis_laz, it is a good option to actually not use a nested list structure for representing data, but rather use a (flat, 2D, non-nested) data.frame. This greatly simplified coding and increased code conciseness and readability for the above example and seems to be promising for other cases, too.
It completely eliminates the need for nested apply
-foo and complicated list traversing. Moreover, it allows for the usage of a lot of built-in R functionality that works on data frames.
I think there are two main reasons why I did not consider this solution before:
Here is the rewritten example. Comments are most welcome.
sum.MU.firing.contributions <- function(.firing.contribs, .firing.instants) {
firing.info <- merge(.firing.contribs, .firing.instants)
firing.info$contribs.time <- mapply(calc.MU.electrode.contrib,
firing.info$contrib,
firing.info$instants,
SIMPLIFY = FALSE)
surface.potentials <- by(I(firing.info$contribs.time),
factor(firing.info$electrode),
function(list) colSums(do.call(rbind, list)))
surface.potentials
}
calc.MU.electrode.contrib <- function(.MU.firing.contrib, .MU.firing.instants) {
## This will in reality be more complicated since then .MU.firing.contrib
## will have a different (more complicated) structure.
.MU.firing.contrib * .MU.firing.instants
}
firing.instants <- data.frame(muscle = c(1,1,2,2,2),
MU = c(1,2,1,2,3),
instants = I(list(c(F,F,T,F),c(F,T,F,F),
c(F,F,F,F),c(T,F,T,F),c(F,T,T,F))))
firing.contribs <- data.frame(muscle = c(1,1,1,1,2,2,2,2,2,2),
MU = c(1,1,2,2,1,1,2,2,3,3),
electrode = c(1,2,1,2,1,2,1,2,1,2),
contrib = c(1,2,3,4,5,6,7,8,9,10))
surface.potentials <- sum.MU.firing.contributions(firing.contribs,
firing.instants)