Search code examples
statasubsampling

By household, keep data only if observations started after Feb. 2000 - Stata


I am working in Stata and have data that lists out portfolios(houseID), the year and month, the stockID and the stock's return. The data spans several years. And looks like:

My data

I am essentially trying to isolate a sub-sample of the data. I would like to keep only those houses and their data if their first portfolio observation was in February 2000. In the above data, I'd like to drop houses 223 and 382 and only keep the data for 448.

My first attempt was to do something like:

by HouseID: keep if....

but I am continually botching it. Does anyone have any ideas? Thanks for the help!!


Solution

  • clear all
    set more off
    
    input ///
    houseid year month
    223 1997 1
    223 1997 2
    223 1998 1
    223 2000 1
    223 2000 2
    223 2000 3
    448 2000 2
    448 2000 3
    end
    
    list
    
    bysort houseid (year month): keep if year[1] == 2000 & month[1] == 2
    
    list
    

    keep will delete unwanted observations. Instead, you could also mark the subsample of interest and work with that. For example

    bysort houseid (year month): gen ok = year[1] == 2000 & month[1] == 2
    
    <some command> if ok
    

    For more advanced date manipulations try working with date variables. See for example

    http://www.stata.com/help.cgi?dates_and_times

    http://www.stata.com/support/faqs/data-management/handling-date-information/