Search code examples
sqlrrdbmsspss

Automating definition of tables and columns creating an RDBMS from a stats package like SPSS


What is a good tool to use to automate or semi-automate (ie give a good headstart) the process of taking a rectangle of data from a statistics package like SPSS and:

  • saving the main rectangle data in text format that can be read by a database
  • saving additional files for other reference tables (eg value labels)
  • drafting (probably to be polished up by hand) a script or batch file with the SQL to create the tables and columns in a database and import the text files as rows of data in the relevant tables?

I doubt full automation is possible but this must be a reasonably common task. We have about a dozen such datasets, some with several hundred variables, that we want to set up in a relational database (Oracle, if that makes any difference). There is no conceptual difficulty in doing this by hand other than prohibitive cost.

I feel there must be such a tool available but I am clearly searching in the wrong places or using the wrong terminology.

(edit - added the R tag because in my own answer to this I am using it as part of the solution)


Solution

  • OK, after further investigation (and thanks for the answer I was given, which was helpful although not quite fully there), I now favour:

    1. bringing the data into R eg with the foreign package
    2. Creating a data frame for each reference table I need
    3. Replacing the original factor with an as.numeric() or unclass() version, so it is just the numbers, not the labels
    4. saving both the main data and the reference tables to the database using sqlSave() from the RODBC package.

    Step 2 is facilitated by a little function like this:

    factorToRef <- function(x, field){
        tmp <- levels(x)
        tab <- data.frame(1:length(tmp), tmp)
        names(tab) <- paste(field, c("_ID","_NAME"), sep="")
        tab
    }
    

    Which can give results like

    > data(iris)
    > factorToRef(iris$Species, "species" )
      species_ID species_NAME
    1          1       setosa
    2          2   versicolor
    3          3    virginica
    

    That are then the basis of a reference table to be saved in the database.