Search code examples
sortinggroup-bystatapostfile

Stata. How to post a matrix of by-sort? Or how to return values of a matrix?


I have the following initial.dta data set:

obs. ID    city       salary
1    123   Normal       100
2    124   Paris        120
3    125   NY           130
4    122   .a           155
5    120   Paris        100
6    128   Chicago      150
................

Some values of city and salary may be missing (.a). I need to calculate the average salary for each possible city value and post it to a different Stata .dta, so that the new Stata file would look like this:

       city       averageSalary
       Normal       120
       Paris        110
       NY           150
       .a           108
       Chicago      160
         .................

I tried this:

clear

postfile myPost city averageIncome using target.dta

    use initial, clear
    sort city
    by city: sum salary, meanonly
    post myPost (`r(city)')  (`r(mean)') // This line does not execute.

postclose myPost    

What is the problem in here? Please, anybody answer this question. Or leave a comment...


Solution

  • You are basing this, I imagine, on my answer to How to retrieve data from multiple Stata files?

    There are three fundamental problems with your code.

    • When you execute

      by city: sum salary, meanonly 
      

    Stata runs through the different values of city, but only the results for the last value will be left in memory. So, although the results you want will be displayed, most of them will not be available, and this command is almost useless for your purpose.

    • You are guessing that there is something called r(city) available to you when you post. That guess is wrong. See the documentation for summarize to see what else is available in memory afterwards.

    • You are misunderstanding what post does. In essence you need to post each row (case, observation, record) of the new results file at a time, typically within a loop. You don't post the whole set of results in one go.

    Fortunately for you and everyone else wanting to do this, there is a much simpler way of getting what you want.

     use initial, clear 
     collapse salary, by(city) 
     save target.dta 
    

    collapse collapses to means by default, but

     collapse (mean) salary, by(city) 
    

    is more explicit syntax.

    See also Maarten Buis's answer to the thread cited here (link above). You can use a different variable name if you want: see the syntax for collapse.