Search code examples
rtidyverser-haven

read_sas cols_only supress error Evaluation error: Column 2 must be named


I have a long list of very large SAS files. I want to import them using read_sas. To increase speed and reduce memory usage I want to only import the columns I am interested in using cols_only.

The trouble is, I have a long list of possible column names - but not every column is in my dataset. If I pass the full list to cols_only, I get the error:

Evaluation error: Column 2 must be named.

Is there a way to suppress this error, and encourage read_sas to do its best to import whatever variables it can from the list I have passed?


Solution

  • As @Andrew mentions in their comment, with haven >= 2.2.0 you can use the new col_select argument for this. To select columns that may not exist, use the helper one_of():

    library(haven)
    library(tidyselect)
    
    f <- tempfile()
    write_sas(mtcars, f)
    
    my_cols <- c("mpg", "i-don't-exist")
    read_sas(f, col_select = one_of(my_cols))
    #> Warning: Unknown columns: `i-don't-exist`
    #> # A tibble: 32 x 1
    #>      mpg
    #>    <dbl>
    #>  1  21  
    #>  2  21  
    #>  3  22.8
    #>  4  21.4
    #>  5  18.7
    #>  6  18.1
    #>  7  14.3
    #>  8  24.4
    #>  9  22.8
    #> 10  19.2
    #> # ... with 22 more rows