I have a part of big dataset. Many variables contain value labels, but such values are not present in this part of dataset. I would like to remove the redundant value labels from the dataset. I tried to do that in Stata using various approaches but did not succeed.
Apparently this does not work:
label drop X if X == 1
Added text: So far I came with the following solutions which are not perfect because I need to repeat this exercise again and again in future:
First (semi-manual):
fre var
di r(lab_valid);
label drop var;
label define var 1 "Label 1" 2 "Label 2" 3 "label 3", modify.
Second (X is a label code that need to be kept. The problem is that I have multiple that need to be kept):
labellist var
local min = r(var_min)
local max = r(var_max)
forval i = `min'/`max' {
if `i' != X {
label define var `i' "", modify
}
}
No "apparently" about it: that is not legal code, nor does it even make sense in principle. At best label drop
drops named labels, but the name of the labels and the name of any variable they are attached to do not coincide unless you have set it up that way.
This is dubious:
Stata doesn't use a lot of memory storing value labels in most cases. Much of the point of value labels is that a value label need only be stored once.
This kind of question seems to imply that value labels were set up before you came along and that each value might find an observation to stick to. That was very possibly wise thinking.
This is dangerous:
The same value labels may be used for more than one variable, so in principle you need to check for use on all the variables that use a particular set.
You need to worry about what might happen if you append
or merge
with similar datasets. That could lead to more mess than you want.
Less biting, but also worth mentioning, is that a value label that isn't in the data might still be useful for graphical purposes.
So, I don't advise what you're thinking of. You could try a decode
of each variable with value labels and then an encode
based on those values. But the value labels wouldn't necessarily be in a desired order. By default encode
would use alphabetical order and you end up with nonsense like 1 "Acceptable" 2 "Bad" 3 "Good"
or 1 "Agree" 2 "Disagree" 3 "Neutral"
. It's possible to imagine ending up with more labels than you started with.
There are other ways to do it properly, but it's a small project.
Executive summary: Sorry, but that doesn't sound like a good idea.
EDIT: This is hacked out of dataex
. It should work for various versions <15.
*! 1.0.0 NJC 11apr2018
program showvaluelabelsused
version 15
syntax [varlist]
quietly ds, has(vallabel)
foreach v in `r(varlist)' {
local l : value label `v'
local vlabels : list vlabels | l
}
foreach vl in `vlabels' {
local alllevels
qui ds , has(vallabel `vl')
local vlist `r(varlist)'
foreach v in `vlist' {
qui levelsof `v', local(levels) missing
local alllevels : list alllevels | levels
dis as res "label values `v' `vl'"
}
foreach n in `alllevels' {
local ltext : label `vl' `n', strict
if `"`ltext'"' != "" {
if strpos(`"`ltext'"',char(34)) dis as res `"label def `vl' `n' `"`ltext'"', modify"'
else dis as res `"label def `vl' `n' "`ltext'", modify"'
}
}
}
end
. sysuse auto, clear
(1978 Automobile Data)
. showvaluelabelsused
foreign
label values foreign origin
label def origin 0 "Domestic", modify
label def origin 1 "Foreign", modify
. keep if foreign
(52 observations deleted)
. showvaluelabelsused
label values foreign origin
label def origin 1 "Foreign", modify
. webuse nlswork, clear
(National Longitudinal Survey. Young Women 14-26 years of age in 1968)
. showvaluelabelsused
label values race racelbl
label def racelbl 1 "white", modify
label def racelbl 2 "black", modify
label def racelbl 3 "other", modify
. keep if race == 2
(20,483 observations deleted)
. showvaluelabelsused
label values race racelbl
label def racelbl 2 "black", modify