Search code examples
datestatalag

Creating lag values for a given set of days


so I have certain dates for each ticker In my data, for example:

ticker day indicator
AAPL 21.04.2023 4.42
AAPL 22.04.2023 2.22
AAPL 23.04.2023 1.42
AAPL 27.04.2023 7.42
AAPL 28.04.2023 8.42
AAPL 31.04.2023 10.42
GOOGL 21.04.2023 2.42
GOOGL 22.04.2023 3.42
GOOGL 25.04.2023 10.42
GOOGL 27.04.2023 6.42

I'm trying to create a lag variable on my custom dates. so it will look like this:

ticker day indicator lag_indicator
AAPL 21.04.2023 4.42 .
AAPL 22.04.2023 2.22 4.42
AAPL 23.04.2023 1.42 2.22
AAPL 27.04.2023 7.42 1.42
AAPL 28.04.2023 8.42 7.42
AAPL 30.04.2023 10.42 8.42
GOOGL 21.04.2023 2.42 .
GOOGL 22.04.2023 3.42 2.42
GOOGL 25.04.2023 10.42 3.42
GOOGL 27.04.2023 6.42 6.42

I tried using stata business calendar with:

// create Nasdaq calander 
bcal create nasdaq, from(date_var) 
gen bizdate = bofd("nasdaq",date_var)
bcal load nasdaq
format %tbnasdaq bizdate


xtset ticker_encoded bizdate

But it didn't seem to work, As say 30.04.2023 and 27.04.2023 doesn't have any lag values.

Any help would be much appreciated!


Solution

  • I have not used bcal. If you do not need to use that tool, here is a reproducible example to get the output you want.

    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str5 ticker str10 day float indicator
    "AAPL"  "21.04.2023"  4.42
    "AAPL"  "22.04.2023"  2.22
    "AAPL"  "23.04.2023"  1.42
    "AAPL"  "27.04.2023"  7.42
    "AAPL"  "28.04.2023"  8.42
    "AAPL"  "30.04.2023" 10.42
    "GOOGL" "21.04.2023"  2.42
    "GOOGL" "22.04.2023"  3.42
    "GOOGL" "25.04.2023" 10.42
    "GOOGL" "27.04.2023"  6.42
    end
    
    * Read the dates into Stata date format
    gen day2 = date(day, "DMY")
    format day2 %d
    
    * day2 sorts correctly such that "17.04.2023" is not sorted before "2.04.2023"
    * as any string starting with 1 is sorted before any string starting with 2,
    * no matter what the string represents
    sort ticker day2
    
    * Get indicator from previsous row ([_n-1]) if ticker is same on previous row
    gen lag_indicator = indicator[_n-1] if ticker == ticker[_n-1]