Search code examples
rtidyversestatapanel

Tracking changes per observation in a continuous variable


I am trying to evaluate a property tax policy that was introduced in a U.S county, where properties over a threshold (i.e. 500 square meters) faced a higher property tax rate than those below the cutoff. I have microdata for all properties in the county between 1990 and 2006. Anecdotally, I am aware that some landowners of properties over 500 square meters tried to avoid the tax by breaking their property into several sub-properties, so that they are right below the cutoff.

However, I am trying to investigate empirically by tracking two variables “lot_number” and "area" which refer to the floorplan and area for each property in the county. Specifically, if I notice that hypothetical "lot_number" A within "masterplan" 100 changes its "area" from 800 square meters before the tax to say 400 square meters post the policy announcement, then this is evidence of tax avoidance behavior.

However, I am not sure how to code my data where I can monitor tax avoidance behavior as described above.

My dataset looks as follows:

* Example generated by -dataex-. For more info, type help dataex
clear
input str109 masterplan str66 lotnumber str40 area str30 pricesqm str42 transactiondate
"/2022"        " 46"                                 "625"      "260.8"     "2004/01/24"
"/2485"        " 261/2"                              "435"      "103.4483"  "2004/01/29"
"/2485"        " 63"                                 "625"      "75.2"      "2004/01/23"
"/3152"        " 114"                                "500"      "170"       "2004/01/28"
"/3152"        " 134"                                "993.05"   "160.6163"  "2004/01/06"
"/3152"        " 141"                                "600"      "131.44"    "2005/01/28"
"/3152"        " 159"                                "500"      "154"       "2003/01/28"
"/3152"        " 161"                                "500"      "155"       "2002/01/29"

Solution

  • One way of analyzing this would be to do as Nick suggested and use destring area pricesqm. Note that in the following code, I added four lines to your data example so that there was an example of a masterplan-lotnumber changing over time:

    clear
    
    input str109 masterplan str66 lotnumber str40 area str30 pricesqm str42 transactiondate,
    "/2022"        " 46"                                 "625"      "260.8"     "2004/01/24"
    "/2485"        " 261/2"                              "435"      "103.4483"  "2004/01/29"
    "/2485"        " 63"                                 "625"      "75.2"      "2004/01/23"
    "/3152"        " 114"                                "500"      "170"       "2004/01/28"
    "/3152"        " 134"                                "993.05"   "160.6163"  "2004/01/06"
    "/3152"        " 141"                                "600"      "131.44"    "2005/01/28"
    "/3152"        " 159"                                "500"      "154"       "2003/01/28"
    "/3152"        " 161"                                "500"      "155"       "2002/01/29"
    "/9998"        " 999"                                "800"      "155"       "2003/02/28"
    "/9998"        " 999"                                "400"      "155"       "2004/03/15"
    "/9999"        " 999"                                "800"      "155"       "2004/02/28"
    "/9999"        " 999"                                "800"      "155"       "2005/03/15"
    
    end
    
    compress
    destring area pricesqm, replace
    
    *create a clean date from the transaction date and format for ease
    gen trans_date_clean = daily(transactiondate, "YMD")
    format trans_date_clean %tdnn/dd/YY
    
    *create an id for each masterplan-lotnumber
    sort masterplan lotnumber trans_date_clean
    egen id = group(masterplan lot)
    
    *create a flag that equals 1 if the id is the same as the previous
    *id, the previous area is greater than 500, and the area of this 
    *observation is less than 500. This approach depends on the sort
    *before the egen command above so that masterplan-lotnumbers are
    *grouped together.
    gen flag = 0
    replace flag = 1 if id == id[_n-1] & area[_n-1] > 500 & area < 500
    

    In addition, this is just one way to approach this problem. After this, you might need to take steps to ensure that you don't double count any lots if they change areas frequently, or you might need to add extra conditions to the flag if the area changes need to be within a certain time period. Alternatively, you could also consider reshaping the data wide by the id (the masterplan-lotnumber) and the transaction year (from the transaction date) and comparing the differences in housing areas between two years.