Search code examples
rnames

Changing column names then converting all negative values to NA


Hi I am trying to change all my column names to different names and then convert all my column vectors which hold negative values to NA. I got the second part right but for some reason I am unable to properly change the column names to different names. This is my code; note that mscr is the csv with the column names I wish to change; I just rename it to df2. Thank you for your time and help.

df2 <- mscr %>%
  rename(
    caseid = R0000100,
    children2000 = R6389600
    )

df2 <- mscr
df2[df2 < 0] <- NA

Solution

  • I might be misunderstanding, but I think what you're doing is renaming the columns (successfully), and then over-writing the newly-renamed data with the original. That is,

    df2 <- mscr %>% rename(...)
    

    is correct, and the names should then be changed. The moment you then do

    df2 <- msvr
    

    before you then replace non-positive values, you revert any changes you made.

    rename (and just about every "verb" function in dplyr and many in R) operates solely in a functional manner, which means the input data is completely unchanged. If it were changed in-place, this would be "side effect", and antithetic to the "normal/idiomatic way" to do things in R.

    Try this:

    library(dplyr)
    df2 <- mscr %>%
      rename(
        caseid = R0000100,
        children2000 = R6389600
      ) %>% 
      mutate(across(everything(), ~ if_else(. < 0, .[NA], .)))
    

    One would normally want to use just NA, but since NA is technically a logical class, and I'm inferring that your data is numeric or integer, we need to get the right class. One option is to do this step individually for numeric and then integer columns, for which we would use NA_real_ and NA_integer_, respectively. However, .[NA] in this case will give the NA classed the same as the original column data.