Before I start:
I'm aware that there are quite a few threads on this error "out there" but none of them seem to narrow the problem down to very long pipe statements specifically.
By chance I came across the observation that very long pipe (%>%
) statements can trigger the
"Error: C stack usage is too close to the limit"
error.
Also, some machines I have access to will trigger this earlier while others will trigger it later.
Ubuntu 20.04
with R 4.0.5
(rocker/r-ver:4.0.5
) tiggers later
Ubuntu 22.04
with R 4.3.2
(rocker/r-ver:4.3.2
) tiggers earlier
Both machines return 8192
for ulimit -s
.
Since I don't want to paste hundreds of lines of redundant code here I will give an example that will need to be adapted for personal reproduction on your local machine. Repeating the piping into the mutate()
statement about 1000 times should do the trick on most machines.
library(dplyr)
df <- data.frame(col1 = 1:10000)
df <- df %>%
mutate(col1 = 1)
# uncomment and copy paste below lines until about 1000 mutate() statements are used
#mutate(col1 = 1) %>%
#mutate(col1 = 1) %>%
#mutate(col1 = 1) %>%
As a solution I now break down long pipes into smaller chunks so that's easy enough really. I'm still interested in some more background on the topic.
Why does this trigger that error and does anyone have general thoughts on the topic?
In practice such long pipe chains may be implemented when creating ADaM datasets with workflows leaning on packages from the admiral
family.
Short answer in reference to Roland's comment:
Because very long chained pipe statements can create large stacks of function calls.
Long answer:
In standard R it is unlikely that someone would nest hundreds of function calls to create a certain output. This is because such a statement would be hard to both read and write. It is much more likely that intermediate results would be generated, stored in individual variables and further processed downstream by the next required functions in the workflow thereafer, hence effectively preventing very large stacks of functions calls and the error described here.
Conversely, (by design) it is easily possible to create very large stacks of readable function calls with the pipe %>%
(every layer in the entire pipe increases the stack) which can ultimately lead to the reported error if pushed to the extreme.