Search code examples
regressionstatarolling-computation

Speeding up rolling regressions in Stata


Should I avoid rolling and manually code rolling regressions? Or am I better off creating a giant panel with overlapping entries and using statsby? I.e., give each window its own by entry. In R I can pre-split the data into a list of date frames, which I think speeds up subsequent operations.

When I first switched from R to Stata a month ago I asked this on Statalist and the consensus was that it should take a long time. I coded and compiled OLS in Mata and noticed no speed improvement (actually, a slight worsening).

This seems rolling regressions are a common technique and Stata seems pretty sophisticated; are most researchers running these regressions for 1+ days? Or are they using SAS for these calculations? For example, I run the following following on the Compustat data base from 1975 to 2010 (about 30,000 regressions) and it takes about 12 hours.

rolling arbrisk = (e(rss) / e(N)), window(48) stepsize(12) ///
         saving(arbrisk, replace) nodots: regress r1 ewretd

Solution

  • I think the people from Statalist are right when they say that this should take a long time. You are running 30000 regressions on an important number of observations.

    If you want to know where Stata is spending its time, you can use the profiler command.

    profiler clear
    profiler on
    rolling arbrisk = (e(rss) / e(N)), window(48) stepsize(12) ///
         saving(arbrisk, replace) nodots: regress r1 ewretd
    profiler off
    profiler report
    

    I wonder if creating a giant panel will help. You are likely to run into memory problems. You should check beforehands how big your panel will be and how much memory it will take:

    http://www.stata.com/support/faqs/data/howbig.html

    I am not surprised that using a self-coded OLS routine does no improve performance. The regress command is a so-called built-in command and is already pretty efficient. It will be hard to do better.

    As far as SAS is concerned, run a couple of regressions in SAS and check how much time it takes. Do the same in Stata. My experience has been that Stata's regress is a bit faster than proc reg in SAS.