I'm working with large Stata files with variable names and labels. I need these labels to understand what each variable is.
I have been using
df[] %>% map_chr(~attributes(.)$label)
to extract the variable names and associated labels. Unfortunately some of the datasets have variables that are missing any label (see picture below).
This means that when I try the above code, I just get an error.
Error: Result 1 is not a length 1 atomic vector
Ideally I'd have a way of either calling all the missing labels "NA" or nothing, so I could get an output like this:
#Only where variables with missing values simply don't have a label, but are still included.
I feel like purrr's strictness is getting in the way of what you want here. If you just lapply()
(or purrr::map()
), you'll get a list, which is perfectly nice to work with:
# get an example Stata dataset
# drop the label on `price`
attr(auto$price, "label") <- NULL
# get all of the labels as a list
labels <- lapply(auto, attr, "label")
This gives you:
> str(labels)
List of 12
$ make : chr "Make and Model"
$ price : NULL
$ mpg : chr "Mileage (mpg)"
$ rep78 : chr "Repair Record 1978"
$ headroom : chr "Headroom (in.)"
$ trunk : chr "Trunk space (cu. ft.)"
$ weight : chr "Weight (lbs.)"
$ length : chr "Length (in.)"
$ turn : chr "Turn Circle (ft.) "
$ displacement: chr "Displacement (cu. in.)"
$ gear_ratio : chr "Gear Ratio"
$ foreign : chr "Car type"
You can unlist()
that if you're willing to exclude labels for variables that have no labels:
> unlist(labels)
make mpg rep78 headroom
"Make and Model" "Mileage (mpg)" "Repair Record 1978" "Headroom (in.)"
trunk weight length turn
"Trunk space (cu. ft.)" "Weight (lbs.)" "Length (in.)" "Turn Circle (ft.) "
displacement gear_ratio foreign
"Displacement (cu. in.)" "Gear Ratio" "Car type"