Given a local macro that contains a string of levels which are separated by either comma (",") or comma and space (", ") or even only space (" "), is there a simple way to extract the first N levels (or words) of this local macro?
The string would look like "12, 123, 1321, 41"
, or "12,123,1321,41"
or "12 123 1321 41"
.
Basically I would be happy with a version of the Macro Function word # of string
that
would work more or less like word 1/N of string
. (See "Macro functions for parsing" in
pg 12 in Macro definition and manipulation)
For more context, I am working with the output of levelsof, local() sep()
. So
I can choose the separator that can be worked with more easily. I want to
pass the resulting levels as an argument to the inlist()
function. The following
usually works, but inlist()
only takes up to 250 arguments. That is why I would
like to extract chunks of 250 words of the results of levelsof()
sysuse auto, clear
levelsof mpg if trunk > 20, local(levels) sep(", ")
list if inlist(mpg, `levels')
I have figured out a non-simple of way achieving that, but it is not looking good and I am wondering if there is a simple, built-in way of doing the same.
sysuse auto, clear
levelsof mpg if trunk > 20, local(levels) sep(", ")
scalar number_of_words = 3
forvalues i = 1 (1) `=number_of_words' {
local word_i = `i'
local this_level : word `word_i' of `levels'
local list_of_levels = "`list_of_levels'`this_level'"
di as text "loop: `i'"
di as text "this level: `this_level'"
di as text "list of levels so far: `list_of_levels'"
}
di "`list_of_levels'"
// trim trailing comma
local trimmed_list_of_levels = substr( "`list_of_levels'" , 1 , strlen( "`list_of_levels'" )-1)
di "`trimmed_list_of_levels'"
list make mpg price trunk if inlist(mpg, `trimmed_list_of_levels')
. sysuse auto, clear
(1978 Automobile Data)
.
. levelsof mpg if trunk > 20, local(levels) sep(", ")
12, 15, 17, 18
. scalar number_of_words = 3
. forvalues i = 1 (1) `=number_of_words' {
2. local word_i = `i'
3. local this_level : word `word_i' of `levels'
4. local list_of_levels = "`list_of_levels'`this_level'"
5.
. di as text "loop: `i'"
6. di as text "this level: `this_level'"
7. di as text "list of levels so far: `list_of_levels'"
8. }
loop: 1
this level: 12,
list of levels so far: 12,
loop: 2
this level: 15,
list of levels so far: 12,15,
loop: 3
this level: 17,
list of levels so far: 12,15,17,
.
. di "`list_of_levels'"
12,15,17,
.
. // trim trailing comma
. local trimmed_list_of_levels = substr( "`list_of_levels'" , 1 , strlen( "`list_of_levels'" )-1)
.
. di "`trimmed_list_of_levels'"
12,15,17
. list make mpg price trunk if inlist(mpg, `trimmed_list_of_levels')
+------------------------------------------+
| make mpg price trunk |
|------------------------------------------|
2. | AMC Pacer 17 4,749 11 |
5. | Buick Electra 15 7,827 20 |
23. | Dodge St. Regis 17 6,342 21 |
26. | Linc. Continental 12 11,497 22 |
27. | Linc. Mark V 12 13,594 18 |
|------------------------------------------|
31. | Merc. Marquis 15 6,165 23 |
53. | Audi 5000 17 9,690 15 |
74. | Volvo 260 17 11,995 14 |
+------------------------------------------+
The following does not work, for example. It returns the error 130 expression too long
.
clear
set obs 1000
gen id = _n
gen x1 = rnormal()
sum *
levelsof id if x1>0, local(levels) sep(", ")
sum * if inlist(id, `levels')
clear
set obs 5000
gen id = round(_n/5)
gen x1 = rnormal()
sum *
levelsof id if x1>2, local(levels) sep(", ")
sum * if x1>2 // if threshold is small enough, there will be too many values for inlist()
sum * if inlist(id, `levels')
Using your additional example as a basis, you could use egen max
to create a flag that is 1 for entire id
that has any cases where x1
value is above a certain threshold. For example:
clear
set seed 2021
set obs 5000
gen id = round(_n/5)
gen x1 = rnormal()
sum *
levelsof id if x1>2, local(levels) sep(", ")
sum * if x1>2 // if threshold is small enough, there will be too many values for inlist()
sum * if inlist(id, `levels')
//This will do the same thing
gen over_threshold = x1>2
egen id_over_thresh = max(over_threshold), by(id)
sum * if id_over_thresh