I have daily panel data with four variables: date, cusip(id identifier), PD (probability of default), and price. PD is only available on a quarterly basis for the first day of January, April, July, and October. I want to generate daily data for PD using Chow-Lin frequency conversion from tempdisagg
package. I know how to apply td()
function on time series, but I didn't find examples with panel data frames. Here are my code and sample data using reproduce()
from devtools
package, so only few sample days are included instead of full quarter. Running td()
reports an error:
Error in td(PD ~ price, conversion = "first", method = "chow-lin-fixed", fixed.rho = 0.5) : In numeric mode, 'to' must be an integer number.
I know that both price and PD are high-frequency daily indicators in mydata
, so I guess I need to use to.quarterly()
function on PD
or something similar.
library(dplyr)
library(zoo)
library(tempdisagg)
library(tsbox)
mydata <- structure(list(date = structure(c(13516, 13516, 13517, 13517,13518, 13518, 13521, 13605, 13605, 13606), class = "Date"), cusip = c("31677310","66585910", "31677310", "66585910", "31677310", "66585910", "31677310","66585910", "31677310", "66585910"), PD = c(0.076891, 0.096,NA, NA, NA, NA, NA, 0.094341, 0.08867, NA), price = c(40.98, 61.31,40.99, 60.77, 40.18, 59.97, 39.92, 59.96, 38.6, 60.69)), row.names = c(6L,13L, 36L, 43L, 66L, 73L, 96L, 1843L, 1866L, 1873L), class = "data.frame")
mydata <- mydata%>%
group_by(cusip) %>%
arrange(cusip,date) %>%
mutate(PDdaily = td(PD ~ price, conversion = "first",method = "chow-lin-fixed", fixed.rho = 0.5))
Your example is not sufficient. For each disaggregation, we need at least 3 low frequency values to be able to perform a regression. Here is an alternative example, with 3 pairs of low and high frequency series:
library(tidyverse)
library(tempdisagg)
library(tsbox)
mydata <- ts_c(
low_freq = ts_frequency(fdeaths, "year"),
high_freq = mdeaths
) %>%
ts_tbl() %>%
ts_wide() %>%
crossing(id = 1:3) %>%
arrange(id)
Applying td
multiple times on data in a data frame will be cumbersome.
It is easier to extract the data into two lists, one with the low and one with high frequency series:
list_lf <- group_split(ts_na_omit(select(mydata, time, value = low_freq, id)), id, keep = FALSE)
list_hf <- group_split(select(mydata, time, value = high_freq, id), id, keep = FALSE)
Now you can use Map()
or map2()
to apply the function to each pair of elements:
ans <- map2(list_lf, list_hf, ~ predict(td(.x ~ .y)))
Transforming the disaggregated data back to a data frame:
bind_rows(ans, .id = "id")
#> # A tibble: 216 x 3
#> id time value
#> <chr> <date> <dbl>
#> 1 1 1974-01-01 59.2
#> 2 1 1974-02-01 54.2
#> 3 1 1974-03-01 54.4
#> 4 1 1974-04-01 54.4
#> 5 1 1974-05-01 47.3
#> 6 1 1974-06-01 42.8
#> 7 1 1974-07-01 43.3
#> 8 1 1974-08-01 40.6
#> 9 1 1974-09-01 42.0
#> 10 1 1974-10-01 47.3
#> # … with 206 more rows
Created on 2020-06-03 by the reprex package (v0.3.0)