I am making a large data frame using mutate
with lots of ifelse
conditions. My approach is to not name the columns within mutate because I have many hundreds of these conditions and each time I update one I then have to update them all. Rather I wish to name the columns after the operation outside of mutate
.
Here is some code outlining what Im trying to do
df <- data.frame(a = rnorm(20, 100, 1), b = rnorm(20, 100, 1), c = rnorm(20, 100, 1) )
df2 <- df %>%
mutate(# condition 1
ifelse((lag(a, 1) - lag(c, 1)) < (lag(a, 2) - lag(b, 2)) &
(lag(a, 1) - lag(c, 1)) < (lag(a, 3) - lag(b, 3)) &
(lag(a, 1) - lag(c, 1)) < (lag(a, 4) - lag(b, 4)), 1, 0),
# condition 2
ifelse((lag(a, 1) - lag(c, 1)) < (lag(a, 2) - lag(b, 2)) &
(lag(a, 1) - lag(c, 1)) < (lag(a, 3) - lag(b, 3)) &
(lag(a, 1) - lag(c, 1)) < (lag(a, 4) - lag(b, 4)) &
(lag(a, 1) - lag(c, 1)) < (lag(a, 5) - lag(b, 5)) &
(lag(a, 1) - lag(c, 1)) < (lag(a, 6) - lag(b, 6)), 1, 0),
# condition 3
ifelse(a < b, 1, 0),
.keep = 'none'
)
c_names <- paste('df', rep(1:ncol(df2), 1), sep = '')
colnames(df2) <- c_names
the trouble is mutate
is truncating the col names of the long ifelse
conditions #condition 1
and #condition 2
and lumping them together as ifelse(...)
so I end up with only 2 columns instead of 3.
Is there something I can do to prevent this behaviour or a more efficient way of achieving what Im try to do. I want to avoid manually typing out hundreds of column names for each condition every time I need to update the df.I would ideally be able to map the identity of the condition back to the new column name. For e.g.
df3 = ifelse(a < b, 1, 0)
This is possible when mutate
doesn't repair the column name
You can wrap the columns in data.frame
, which does not truncate the names so heavily. (The mutate
help page notes that the ...
arguments can be "a data frame or tibble, to create multiple columns in the output.")
df2 <- df %>%
mutate(
data.frame(
# condition 1
ifelse((lag(a, 1) - lag(c, 1)) < (lag(a, 2) - lag(b, 2)) &
(lag(a, 1) - lag(c, 1)) < (lag(a, 3) - lag(b, 3)) &
(lag(a, 1) - lag(c, 1)) < (lag(a, 4) - lag(b, 4)), 1, 0),
# condition 2
ifelse((lag(a, 1) - lag(c, 1)) < (lag(a, 2) - lag(b, 2)) &
(lag(a, 1) - lag(c, 1)) < (lag(a, 3) - lag(b, 3)) &
(lag(a, 1) - lag(c, 1)) < (lag(a, 4) - lag(b, 4)) &
(lag(a, 1) - lag(c, 1)) < (lag(a, 5) - lag(b, 5)) &
(lag(a, 1) - lag(c, 1)) < (lag(a, 6) - lag(b, 6)), 1, 0),
# condition 3
ifelse(a < b, 1, 0)
),
.keep = 'none'
)
c_names <- paste('df', rep(1:ncol(df2), 1), sep = '')
colnames(df2) <- c_names
df2
# df1 df2 df3
# 1 NA NA 0
# 2 NA NA 0
# 3 NA NA 0
# 4 NA NA 0
# 5 1 NA 0
# 6 0 0 1
# 7 1 1 1
# 8 1 1 0
# 9 0 0 1
# 10 1 1 0
# 11 0 0 0
# 12 0 0 0
# 13 0 0 1
# 14 1 0 1
# 15 1 0 1
# 16 0 0 1
# 17 0 0 0
# 18 0 0 1
# 19 1 1 1
# 20 1 1 1