Search code examples
rprivacysurvival

De-identifying survival or flexsurvreg objects in R


Please consider the following:

I need to provide some R code syntax to analyse data with the flexsurv package. I am not allowed to receive/analyse directly or on-site. I am however allowed to receive the analysis results.


Problem

When we run the flexsurvreg() function on some data (here ovarian from the flexsurv package), the created object (here fitw) contains enough information to "re-create" or "back-engineer" the actual data. But then I would technically have access to the data I am not allowed to have.

# Load package
library("flexsurv")
#> Loading required package: survival

# Run flexsurvreg with data = ovarian
fitw <- flexsurvreg(formula = Surv(futime, fustat) ~ factor(rx) + age,
                    data = ovarian, dist="weibull")

# Look at first observation in ovarian
ovarian[1, ]
#>   futime fustat     age resid.ds rx ecog.ps
#> 1     59      1 72.3315        2  1       1

# With the following from the survival object, the data could be re-created
fitw$data$Y[1, ]
#>   time status  start   stop  time1  time2 
#>     59      1      0     59     59    Inf
fitw$data$m[1, ]
#>   Surv(futime, fustat) factor(rx)     age (weights)
#> 1                   59          1 72.3315         1


Potential solution

We could write the code so that it also sets all those data that might be used for this back-engineering to NA as follows:

# Setting all survival object observation to NA
fitw$data$Y <- NA
fitw$data$m <- NA
fitw$data$mml$scale <- NA
fitw$data$mml$rate <- NA
fitw$data$mml$mu <- NA

Created on 2021-08-27 by the reprex package (v2.0.0)


Question

If I proceed as the above and set all these parameters to NA, could I then receive the fitw object (e.g. as an .RDS file) without ever being able to "back-engineer" the original data? Or is there any other way to share fitw without the attached data?

Thanks!


Solution

  • Setting, e.g. fitw$data <- NULL will remove all the individual-level data from the fitted model object. Some of the output functions may not work with objects stripped of data however. In the current development version on github, printing the model object should work. Also summary and predict methods should work, as long as covariate values are supplied in newdata - omitting them won't work, since the default is to take the covariate values from the observed data.