Search code examples
rdplyrtidyversetidyrmutate

stop mutate truncating column column names


I am making a large data frame using mutate with lots of ifelse conditions. My approach is to not name the columns within mutate because I have many hundreds of these conditions and each time I update one I then have to update them all. Rather I wish to name the columns after the operation outside of mutate.

Here is some code outlining what Im trying to do

df <- data.frame(a = rnorm(20, 100, 1), b = rnorm(20, 100, 1), c = rnorm(20, 100, 1) )

df2 <- df %>%
    mutate(# condition 1
           ifelse((lag(a, 1) - lag(c, 1)) < (lag(a, 2) - lag(b, 2)) &
                  (lag(a, 1) - lag(c, 1)) < (lag(a, 3) - lag(b, 3)) &
                  (lag(a, 1) - lag(c, 1)) < (lag(a, 4) - lag(b, 4)), 1, 0), 
           # condition 2
           ifelse((lag(a, 1) - lag(c, 1)) < (lag(a, 2) - lag(b, 2)) &
                  (lag(a, 1) - lag(c, 1)) < (lag(a, 3) - lag(b, 3)) &
                  (lag(a, 1) - lag(c, 1)) < (lag(a, 4) - lag(b, 4)) &
                  (lag(a, 1) - lag(c, 1)) < (lag(a, 5) - lag(b, 5)) &
                  (lag(a, 1) - lag(c, 1)) < (lag(a, 6) - lag(b, 6)), 1, 0),
           # condition 3
           ifelse(a < b, 1, 0),
           .keep = 'none'
           )

c_names <- paste('df', rep(1:ncol(df2), 1), sep = '')
colnames(df2) <- c_names

the trouble is mutate is truncating the col names of the long ifelse conditions #condition 1 and #condition 2 and lumping them together as ifelse(...) so I end up with only 2 columns instead of 3.

Is there something I can do to prevent this behaviour or a more efficient way of achieving what Im try to do. I want to avoid manually typing out hundreds of column names for each condition every time I need to update the df.I would ideally be able to map the identity of the condition back to the new column name. For e.g.

df3 = ifelse(a < b, 1, 0)

This is possible when mutate doesn't repair the column name


Solution

  • You can wrap the columns in data.frame, which does not truncate the names so heavily. (The mutate help page notes that the ... arguments can be "a data frame or tibble, to create multiple columns in the output.")

    df2 <- df %>%
        mutate(
          data.frame(
               # condition 1
               ifelse((lag(a, 1) - lag(c, 1)) < (lag(a, 2) - lag(b, 2)) &
                      (lag(a, 1) - lag(c, 1)) < (lag(a, 3) - lag(b, 3)) &
                      (lag(a, 1) - lag(c, 1)) < (lag(a, 4) - lag(b, 4)), 1, 0), 
               # condition 2
               ifelse((lag(a, 1) - lag(c, 1)) < (lag(a, 2) - lag(b, 2)) &
                      (lag(a, 1) - lag(c, 1)) < (lag(a, 3) - lag(b, 3)) &
                      (lag(a, 1) - lag(c, 1)) < (lag(a, 4) - lag(b, 4)) &
                      (lag(a, 1) - lag(c, 1)) < (lag(a, 5) - lag(b, 5)) &
                      (lag(a, 1) - lag(c, 1)) < (lag(a, 6) - lag(b, 6)), 1, 0),
               # condition 3
               ifelse(a < b, 1, 0)
              ),
              .keep = 'none'
            )
    
    c_names <- paste('df', rep(1:ncol(df2), 1), sep = '')
    colnames(df2) <- c_names
    df2
    #    df1 df2 df3
    # 1   NA  NA   0
    # 2   NA  NA   0
    # 3   NA  NA   0
    # 4   NA  NA   0
    # 5    1  NA   0
    # 6    0   0   1
    # 7    1   1   1
    # 8    1   1   0
    # 9    0   0   1
    # 10   1   1   0
    # 11   0   0   0
    # 12   0   0   0
    # 13   0   0   1
    # 14   1   0   1
    # 15   1   0   1
    # 16   0   0   1
    # 17   0   0   0
    # 18   0   0   1
    # 19   1   1   1
    # 20   1   1   1