Search code examples
rdplyracross

How can I convert this old dplyr syntax?


I am new to dplyr and I'm having difficulties in (i) understanding its syntax and (ii) transforming its old version code into a code I can use in its newest version (dplyr 1.0.2). In particular, I'm confused about the two following lines of code :

mutate_each(funs(replace(.,.=="NOT ANSWERED",NA))) %>%     
mutate_each(funs(ordered(.,c("NOT AT ALL","ONCE A WEEK", "2-4 TIMES PER WEEK/HALF THE TIME", "5 OR MORE TIMES PER WEEK/ALMOST ALWAYS"))))

I think that the first line of code is supposed to replace all the "NOT ANSWERED" with NA.

Do you think that the following transformation is appropariate?

mutate(across(everything(),~replace(., .== "NOT ANSWERED", NA)))

However, I don't understand what the second line of code is about. I believe that it's about creating some sort of ordered variable with "NOT AT ALL", "ONCE A WEEK", "2-4 TIMES PER WEEK/HALF THE TIME" and "5 OR MORE TIMES PER WEEK/ALMOST ALWAYS" as levels.

Do you have any suggestion on what this line does and on how to convert it into the new syntax with mutate(across())?

Some context

I'm trying to follow a tutorial on how to use the Bootnet R package. The following text comes from the first part of the tutorial

To download the dataset, go to: https://datashare.nida.nih.gov/study/nida-ctn-0015 and click on “CTN-0015 Data Files”. The relevant data file is called “qs.csv”, which can be loaded into R by using the default read.csv function:

FullData <- read.csv("qs.csv", stringsAsFactors = FALSE)

This loads the data in long format, which contains a column with subject id’s, a column with the names of the administered items, and a third column containing the item responses. For network analysis, we need the data to be in wide format. Furthermore, we need to assign that the response "NOT ANSWERED" indicates a missing response and other responses are ordinal. Finally, we need to extract relevant dataset at baseline measure for the PTSD symptom frequency scores. To do this, we can utilize the dplyr and tidyr R packages as follows:

# Load packages: 
library("dplyr") 
library("tidyr") 

# Frequency at baseline: 
Data <- FullData %>% 
        filter(EPOCH == "BASELINE",grepl("^PSSR\\d+A$",QSTESTCD)) %>% 
        select(USUBJID,QSTEST,QSORRES) %>% 
        spread(QSTEST, QSORRES) %>% 
        select(-USUBJID) %
        mutate_each(funs(replace(.,.=="NOT ANSWERED",NA))) %>% 
        mutate_each(funs(ordered(.,c("NOT AT ALL","ONCE A WEEK", "2-4 TIMES PER WEEK/HALF THE TIME", "5 OR MORE TIMES PER WEEK/ALMOST ALWAYS"))))

names(Data) <- seq_len(ncol(Data))

The tutorial keeps going in its second part.


Solution

  • ordered is used to create an ordered factor in the order it is presented. Since both the calls are applied to same columns you can combine them into one function. Try :

    library(dplyr)
    
    vals <- c("NOT AT ALL","ONCE A WEEK", "2-4 TIMES PER WEEK/HALF THE TIME", "5 OR MORE TIMES PER WEEK/ALMOST ALWAYS")
    
    Data <- FullData %>%
              #....
              #....
              #....
              mutate(across(.fns = ~ordered(replace(., .== "NOT ANSWERED", NA), vals)))