Search code examples
for-loopstatapercentile

forvalues and xtile in Stata


What do the last two lines do? As far as I understand, these lines loop through the list h_nwave and calculate the weighted quantiles, if syear2digit == 'nwave' , i.e. calculate 5 quantiles for each year. But I'm not sure if my understanding is correct. Also is this equivalent to using group() function?

h_nwave      "91 92 93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15"

generate        quantile_ip = .
  forvalues number = 1(1)15 {
  local       nwave : word `number' of `h_nwave'
  xtile       quantile_ip_`nwave' = a_ip if syear2digit == `nwave' [ w = weight ], nq(5)
  replace     quantile_ip = quantile_ip_`nwave' if syear2digit == `nwave'
  } 

I try to convert this into R with forloop, mutate, xtile (statar package required) and case_when. However, so far I cannot find a suitable way to get similar result.


Solution

  • There is no source or context for this code.

    Detail: The first command is truncated and presumably should have been

    local h_nwave 91 92 93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15
    

    Detail: The first list contains 25 values, presumably corresponding to years 1991 to 2015. But the second list implies 15 values, so we are only looking at 91 to 05.

    Main idea: xtile bins to quintile bins on variable a_ip, with weights. So the lowest 20% of observations (taking weighting into account) should be in bin 1, and so on. In practice observations with the same value must be assigned to the same bin, so 20-20-20-20-20 splits are not guaranteed, quite apart from the small print of whether sample size is a multiple of 5. So, the result is assignment to bins 1 to 5, and not quintiles themselves, or any other kind quantiles.

    This is done separately for each survey wave.

    The xtile command is documented for everyone at https://www.stata.com/manuals/dpctile.pdf regardless of personal or workplace access to Stata.

    In R, you may well be able to produce quintile bins for all survey years at once. I have no idea how to do that.

    Otherwise put, the loop arises because xtile doesn't work on separate subsets in one command call. There are community-contributed Stata commands that allow that. This kind of topic is much discussed on Statalist.