Search code examples
statastata-macros

Multiple local in foreach command macro


I have a dataset with multiple subgroups (variable economist) and dates (variable temps99).

I want to run a tabsplit command that does not accept bysort or by prefixes. So I created a macro to apply my tabsplit command to each of my subgroups within my data.

For example:

levelsof economist, local(liste)

foreach gars of local liste {
    display "`gars'"
    tabsplit SubjectCategory if economist=="`gars'", p(;) sort 
    return list
    replace nbcateco = r(r) if economist == "`gars'"
}

For each subgroup, Stata runs the tabsplit command and I use the variable nbcateco to store count results.

I did the same for the date so I can have the evolution of r(r) over time:

levelsof temps99, local(liste23)

foreach time of local liste23 {
    display "`time'"
    tabsplit SubjectCategory if temps99 == "`time'", p(;) sort
    return list
    replace nbcattime = r(r) if temps99 == "`time'"
}

Now I want to do it on each subgroups economist by date temps99. I tried multiple combination but I am not very good with macros (yet?).

What I want is to be able to have my r(r) for each of my subgroups over time.


Solution

  • This is an example of the XY problem, I think. See http://xyproblem.info/

    tabsplit is a command in the package tab_chi from SSC. I have no negative feelings about it, as I wrote it, but it seems quite unnecessary here.

    You want to count categories in a string variable: semi-colons are your separators. So count semi-colons and add 1.

    local SC SubjectCategory
    gen NCategory = 1 + length(`SC') - length(subinstr(`SC', ";", "", .)) 
    

    Then (e.g.) table or tabstat will let you explore further by groups of interest.

    To see the counting idea, consider 3 categories with 2 semi-colons.

    . display length("frog;toad;newt")
    14
    
    . display length(subinstr("frog;toad;newt", ";", "", .))
    12
    

    If we replace each semi-colon with an empty string, the change in length is the number of semi-colons deleted. Note that we don't have to change the variable to do this. Then add 1. See also this paper.

    That said, a way to extend your approach might be

    egen class = group(economist temps99), label 
    su class, meanonly 
    local nclass = r(N)
    gen result = . 
    
    forval i = 1/`nclass' {
        di "`: label (class) `i''" 
        tabsplit SubjectCategory if class == `i', p(;) sort
        return list
        replace result = r(r) if class == `i'
    }
    

    Using statsby would be even better. See also this FAQ.