Search code examples
rimporttidyversefreaddata-wrangling

Importing a selection of columns from dataframes


I am trying to import multiple dta files without typing individual import code lines and without wasting too much loading time.

There are two challenges. First, each dataframe has its own idiosyncratic name. Think of it as multiple state names: Arizona, Alabama, Texas, etc.

The second challenge is that I only want to import a handful of columns. For instance, I just want to import columns labeled state, id, and temperature. I don’t need to spend extra time for columns that I am going to de-select right away.

I don’t need to rbind these files once they import them.

To restate: I want to import columns state, id, and temperature from dta files Alabama, Arizona, and Texas

Here is some sample data:

set.seed(100)
arizona <- data.frame(state= "AZ",
                        id= 1:100,
                        temperature= runif(100, min=40, max=80),
                        var1= runif(100, min=10, max=20),
                        var2= runif(100, min=50, max=70))

alabama <- data.frame(state= "AL",
                        id= 1:50,
                        temperature= runif(50, min=30, max=70),
                        var1= runif(50),
                        var2= runif(50, min=50, max=70))

texas <- data.frame(state= "TX",
                        id= 1:120,
                        temperature= runif(120, min=35, max=75),
                        var1= runif(120, min=10, max=20),
                        var2= runif(120, min=50, max=70))

Thank you,


Solution

  • There are two read functions for Stata files in my installed packages, but only haven has a column selection option within the code. Something like this untested code:

    libary(haven)
    in_st_list <- lapply( paste( c("Alabama", "Arizona", "Texas"), ".dta", sep=""),
            read_dta, 
            col_select= all_of( c('state', 'id', 'temperature') )
           )
    

    You will get a list of 3 dataframes if I haven't made any syntactic or substantive errors.