I am trying to replicate the Fama-French Operating Profit factor (RMW). I have written the Stata code and got the result in the plot below. For reference, the correlation is only about 0.909. I have been trying to improve the result for a couple of weeks but could not get any progress. I am wondering if I could get some advice here.
Thank you very much!
***Replicating RMW***
use "compustat.dta"
*Process preferred stock*
gen ps = pstkrv
replace ps = pstkl if ps == .
replace ps = pstk if ps == .
replace ps = 0 if ps == .
sum ps, d
*Process stockholder equity*
gen se = seq
replace se = ceq + pstk if se == .
replace se = at - lt if se == .
replace se = bkvlps * csho + pstkrv if se == .
replace se = bkvlps * csho + pstkl if se == .
replace se = bkvlps * csho + pstk if se == .
replace se = bkvlps * csho if se == .
sum se, d
*Process book equity***
replace itcb = 0 if itcb == .
replace txdb = 0 if txdb == .
gen be = se + txdb + itcb - ps
sum be, d
*Process operating profit*
gen n = 0
replace n = 1 if cogs != . | xsga != . | xint != .
replace n = 0 if n == 1 & revt == .
keep if n==1
drop n
keep if be !=.
replace cogs = 0 if cogs == .
replace xsga = 0 if xsga == .
replace xint = 0 if xint == .
drop if be < 0
gen op = 100*(revt - cogs - xsga - xint)/be
drop if op == .
gen n = 1
bys gvkey fyear: egen N = sum(n)
keep if N == 1
keep gvkey datadate fyear be op
append using "Multiple_Observations_Per_Year_processed.dta"
save "OP.dta",replace
*CRSP Side Data Cleaning*
use "CRSP Monthly Return_2.dta", clear
gen year = year(date)
gen month = month(date)
gen modate = ym(year,month)
format modate %tm
keep if shrcd == 10 | shrcd == 11
keep if exchcd == 1 | exchcd == 2 | exchcd == 3
gen age = modate
*Adjust for Delisting*
replace ret = 0 if ret == . & dlret != .
replace dlret = 0 if dlret ==.
gen retadj = (1 + ret)*(1+dlret) - 1
replace ret = retadj
drop if ret ==.
gen price = abs(prc)
gen size = price*shrout
*Dealing with CRSP issue of multiple PERMNO for one PERMCO*
*Following conventional method, sum up market cap for permnos with the same permco, and assign to the permno with largest market cap. Keep that permno.*
gen n = 1
bys permco modate: egen N = sum(n)
sum N
preserve
keep if N > 1
bys permco modate: egen sum_size = sum(size)
bys permco modate: egen rank = rank(-size)
keep if rank == 1 | rank == 1.5
duplicates drop permno modate, force
drop rank
replace size = sum_size
drop sum_size
save "crsp_multiple_permno_2.dta",replace
restore
keep if N == 1
append using "crsp_multiple_permno_2.dta"
drop retadj n N
save "CRSP Monthly Return_2_processed.dta",replace
*Link Compustat accounting OP to CRSP data*
use "OP.dta", clear
merge m:m gvkey using"link_primary"
keep if _merge==3
drop _merge
gen link_year = year( linkdt)
gen end_year = year( linkenddt )
gen year = fyear + 1
keep if year >= link_year
keep if fyear <= end_year | end_year == .
rename N age_in_compustat
keep gvkey fyear op permno year age_in_compustat
duplicates drop permno year, force
merge 1:1 permno year using"June_Size_2.dta"
keep if _merge == 3
drop _merge
drop if size == .
tab year
tab fyear
drop if fyear < 1961
*Generate 2x3 portfolio using f-f breakpoints*
merge m:1 year using"OP_Breakpoints.dta"
keep if _merge==3
drop _merge
gen p_3 = .
replace p_3 = 1 if op <= thirty
replace p_3 = 2 if op > thirty & op < seventy
replace p_3 = 3 if op >= seventy
merge m:1 year month using"ME_Breakpoints.dta"
keep if _merge==3
drop _merge
replace size = size/1000
gen s_2 = .
replace s_2 = 1 if size <= fifty
replace s_2 = 2 if size > fifty
keep gvkey permno year p_3 s_2 age_in_compustat op
rename year myear
save "2x3 sorts.dta", replace
*Generate monthly portfolio returns*
use"CRSP Monthly Return_2_processed.dta", clear
drop shrcd exchcd prc shrout
gen myear = year
replace myear = year - 1 if month<=6
merge m:1 permno myear using"2x3 sorts.dta"
drop if _merge == 2
drop _merge
*drop if p_3 == 2
duplicates drop permno modate, force
*Weighting alternative #1*
tsset permno modate
gen l_size = l.size
*Weighting alternative #2*
tsset permno modate
gen l2_size = l2.size
gen l_retx = l.retx
gen l_size = l2_size*(1+l_retx)
*Generate returns*
preserve
collapse (mean) vwret = ret [w=l_size] if p_3 == 1, by(modate s_2)
reshape wide vwret, i(modate) j(s_2)
gen ret_weak = (vwret1 + vwret2) / 2
replace ret_weak = ret_weak * 100
keep modate ret_weak
save "ret_weak.dta",replace
restore
preserve
collapse (mean) vwret = ret [w=l_size] if p_3 == 3, by(modate s_2)
reshape wide vwret, i(modate) j(s_2)
gen ret_robust = (vwret1 + vwret2) / 2
replace ret_robust = ret_robust * 100
keep modate ret_robust
save "ret_robust.dta",replace
*Compare replicated RMW with actual rmw*
merge 1:1 modate using"ret_weak.dta"
drop _merge
gen RMW = ret_robust - ret_weak
sum RMW
merge 1:1 modate using"f-f rmw.dta"
keep if _merge == 3
drop _merge
gen diff = RMW-rmw
sum diff
*Plotting RMW against rmw*
gen age = modate
gen RMW_ret = (100+RMW)/100
gen rmw_ret = (100+rmw)/100
sort modate
gen cum_ret_RMW = .
replace cum_ret_RMW = RMW_ret if age == 42
replace cum_ret_RMW = RMW_ret * cum_ret_RMW[_n-1] if cum_ret_RMW == .
sort modate
gen cum_ret_rmw = .
replace cum_ret_rmw= rmw_ret if age == 42
replace cum_ret_rmw = rmw_ret * cum_ret_rmw[_n-1] if cum_ret_rmw == .
tsset modate
tsline cum_ret_RMW cum_ret_rmw
Cumulative Return (my replication is in blue, original factor in red)
I found that the breakpoints have to be based on NYSE stocks only and then apply to the other stock exchanges. My correlations increase from 90% to 96% after this refinement.
Hope that helps.