Search code examples
rattributeslabelr-haven

extracting Stata labels in R when some variables are missing labels


I'm working with large Stata files with variable names and labels. I need these labels to understand what each variable is.

I have been using

df[] %>% map_chr(~attributes(.)$label)

to extract the variable names and associated labels. Unfortunately some of the datasets have variables that are missing any label (see picture below).

enter image description here

This means that when I try the above code, I just get an error.

Error: Result 1 is not a length 1 atomic vector

Ideally I'd have a way of either calling all the missing labels "NA" or nothing, so I could get an output like this:

enter image description here

#

Only where variables with missing values simply don't have a label, but are still included.


Solution

  • I feel like purrr's strictness is getting in the way of what you want here. If you just lapply() (or purrr::map()), you'll get a list, which is perfectly nice to work with:

    # get an example Stata dataset
    webuse::webuse("auto")
    
    # drop the label on `price`
    attr(auto$price, "label") <- NULL
    
    # get all of the labels as a list
    labels <- lapply(auto, attr, "label")
    

    This gives you:

    > str(labels)
    List of 12
     $ make        : chr "Make and Model"
     $ price       : NULL
     $ mpg         : chr "Mileage (mpg)"
     $ rep78       : chr "Repair Record 1978"
     $ headroom    : chr "Headroom (in.)"
     $ trunk       : chr "Trunk space (cu. ft.)"
     $ weight      : chr "Weight (lbs.)"
     $ length      : chr "Length (in.)"
     $ turn        : chr "Turn Circle (ft.) "
     $ displacement: chr "Displacement (cu. in.)"
     $ gear_ratio  : chr "Gear Ratio"
     $ foreign     : chr "Car type"
    

    You can unlist() that if you're willing to exclude labels for variables that have no labels:

    > unlist(labels)
                        make                      mpg                    rep78                 headroom 
            "Make and Model"          "Mileage (mpg)"     "Repair Record 1978"         "Headroom (in.)" 
                       trunk                   weight                   length                     turn 
     "Trunk space (cu. ft.)"          "Weight (lbs.)"           "Length (in.)"     "Turn Circle (ft.) " 
                displacement               gear_ratio                  foreign 
    "Displacement (cu. in.)"             "Gear Ratio"               "Car type"