Please consider the following:
I need to provide some R code syntax to analyse data with the flexsurv
package. I am not allowed to receive/analyse directly or on-site. I am however allowed to receive the analysis results.
Problem
When we run the flexsurvreg()
function on some data (here ovarian
from the flexsurv
package), the created object (here fitw
) contains enough information to "re-create" or "back-engineer" the actual data. But then I would technically have access to the data I am not allowed to have.
# Load package
library("flexsurv")
#> Loading required package: survival
# Run flexsurvreg with data = ovarian
fitw <- flexsurvreg(formula = Surv(futime, fustat) ~ factor(rx) + age,
data = ovarian, dist="weibull")
# Look at first observation in ovarian
ovarian[1, ]
#> futime fustat age resid.ds rx ecog.ps
#> 1 59 1 72.3315 2 1 1
# With the following from the survival object, the data could be re-created
fitw$data$Y[1, ]
#> time status start stop time1 time2
#> 59 1 0 59 59 Inf
fitw$data$m[1, ]
#> Surv(futime, fustat) factor(rx) age (weights)
#> 1 59 1 72.3315 1
Potential solution
We could write the code so that it also sets all those data that might be used for this back-engineering to NA
as follows:
# Setting all survival object observation to NA
fitw$data$Y <- NA
fitw$data$m <- NA
fitw$data$mml$scale <- NA
fitw$data$mml$rate <- NA
fitw$data$mml$mu <- NA
Created on 2021-08-27 by the reprex package (v2.0.0)
Question
If I proceed as the above and set all these parameters to NA
, could I then receive the fitw
object (e.g. as an .RDS
file) without ever being able to "back-engineer" the original data? Or is there any other way to share fitw
without the attached data?
Thanks!
Setting, e.g. fitw$data <- NULL
will remove all the individual-level data from the fitted model object. Some of the output functions may not work with objects stripped of data however. In the current development version on github, printing the model object should work. Also summary and predict methods should work, as long as covariate values are supplied in newdata
- omitting them won't work, since the default is to take the covariate values from the observed data.