Search code examples
variablestime-seriesstatagenerate

How to manually calculate Seasonal Variable?


I come up with a regression model in Stata X=L7.X + S6.X where L7 is lags 7 and S6 is season 6 (day 6th in my daily data). We do not care about model output, my problem is the seasonality variable itself. When I'm trying to plug in numbers to do forecasting; however, I can not interpret S6.X properly and I don't know how Stata generates S6.X

Here are the detailed variables DETAILED

What is the formula for S6.X so that I could manually calculate it ? I've tried to ask ChatGPT and it comes up with the answer Generate a quarterly seasonality variable gen time = _n gen S6.quarterly = sin((2*_pi()*time)/6) But this formula will have value ranges from -1 to 1 but in my data it ranges from -2 to 2 so I might think it is not reliable.

How to manually calculate seasonality variable (start with S.) in Stata?


Solution

  • ChatGPT is feeding you garbage and wasting your time. Or rather, it is what it is and you wasted your time asking something where it can only guess.

    If you want to model seasonality using sine (and cosine) curves, then the operator S. in Stata is irrelevant. You should know whether that is what you want to do.

    Other way round, S. just gives you the difference from the previous value and S6. gives you that for 6 time periods apart. Time series operators are documented under help tsvarlist.

    You can run this script yourself to see what it does.

    clear 
    set obs 12 
    gen y = 1 + 2 * _n 
    gen t = _n
    tsset t 
    
    gen Sy = S.y 
    gen S6y = S6.y 
    
    list, sep(0)
    
         +--------------------+
         |  y    t   Sy   S6y |
         |--------------------|
      1. |  3    1    .     . |
      2. |  5    2    2     . |
      3. |  7    3    2     . |
      4. |  9    4    2     . |
      5. | 11    5    2     . |
      6. | 13    6    2     . |
      7. | 15    7    2    12 |
      8. | 17    8    2    12 |
      9. | 19    9    2    12 |
     10. | 21   10    2    12 |
     11. | 23   11    2    12 |
     12. | 25   12    2    12 |
         +--------------------+
    

    Other way round, it is hard to comment on your data example, as it's an image that doesn't allow copy and paste. Also, you tell us nothing about your time variable or whether you have a single time series or panel (longitudinal) data. If this answer doesn't help (much), you probably need to revise your question with more details.