Search code examples
rr-haven

How to encrypt a SPSS-file using cypher


Is there a way to encrypt SPSS-files (.sav) using the cyphr-package? Encrypting .csv works fine, but when I try to encrypt .sav, I get following error-message:

  Error in db_lookup(dat$ns, dat$name, file_arg) : 
  Rewrite rule for haven::write_sav not found

Solution

  • I have found a solution where I first convert the original files (*.csv and .sav) into *.rds files. After that they are encrypted. This works as intended.

    With this procedure, encrypted *.rds files with the same name are created and saved in a separate folder for all *.csv and *.sav in the original folder.

    Load packages:

    library(rio)
    library(stringr)
    library(cyphr)
    

    Set paths to the folder with original unencrypted data (data_originals) and to the folder to store the encrypted data (data_encypted):

    path_originals <- "./data_originals"
    path_encrypted <- "./data_encypted"
    

    Set working directory:

    setwd(path_originals)
    

    Specify the directory in which the encrypted files are to be stored (data_encypted).

    data_dir <- file.path(path_encrypted)
    

    Set path of personal key:

    path_key_user <- "~/.ssh/"
    

    Create a key for the data and encrypt that key with personal key:

    data_admin_init(data_dir, path_user = path_key_user)
    

    Get the data key and add encrypted data to the directory:

    key <- cyphr::data_key(data_dir, path_user = path_key_user)
    

    For *.csv-files:

    Write all *.csv files in the folder data_originals to a list:

    filenames_csv <- list.files(path = path_originals, pattern = "*.csv")
    

    Read in *.csv files located in the folder data_originals:

    df_csv <- lapply(filenames_csv, read.csv2)
    

    Create a list of what the *.csv files should be named as *.rds files:

    filenames_csv %>% str_replace(".csv", ".rds") -> filenames_csv2rds
    

    Save the *.csv files as *.rds files to the folder created for the encrypted files (data_encrypted):

    for (i in 1:length(df_csv)) {
      setwd(path_encrypted)
      export(df_csv[i], filenames_csv2rds[i]) #
    }
    

    For *.sav-files:

    Set working directory:

    setwd(path_originals)
    

    Write all *.sav files in the folder data_originals to a list:

    filenames_sav <- list.files(path = path_originals, pattern = "*.sav")
    

    Read in *.sav files located in the folder data_originals:

    df_sav <-
      lapply(filenames_sav,
             Hmisc::spss.get,
             use.value.labels = T,
             lowername = T)
    

    Create a list of what the *.sav files should be named as *.rds files:

    filenames_sav %>% str_replace(".sav", ".rds") -> filenames_sav2rds
    

    Save the *.sav files as *.rds files to the folder created for the encrypted files (data_encrypted):

    for (i in 1:length(df_sav)) {
      setwd(path_encrypted)
      export(df_sav[i], filenames_sav2rds[i]) #
    }
    

    Write the names of the *.rds files that are now in the data_encrypted folder and are still to be encrypted in a list:

    filenames <- list.files(path = path_encrypted, pattern = "*.rds")
    

    Read in all *.rds files located in the folder data_encrypted.

    ldf <- lapply(filenames, readRDS)
    

    Define paths:

    paths <- file.path(data_dir, paste0(filenames))
    

    Encrypt and save all files in folder data_encrypted:

    for (i in 1:length(ldf)) {
      for (i in 1:length(paths)) {
        encrypt(saveRDS(ldf[i], paths[i]), key)
      }
    }