Search code examples
rintegrationspss

How to create variables using R through SPSS and pass back to SPSS?


I've been experimenting with calling R through SPSS.

I have figured out how to pull SPSS data into an R dataframe, create a variable, and pass the dataframe with the new variable back to a SPSS data set.

What I cannot figure out how to do is pass back variables that are additional transformations of the first variable created using R.

Specifically, I first create the variable

index <- c("INDX","label",0,"F8.2","scale")

by scaling the variable B from 0 to 1 and create the dataframe casedata using the code below:

casedata <- data.frame(casedata, ave(casedata$B, casedata$Patient_Type, 
    FUN = function(x) (x- min(x))/(max(x)- min(x))))

I can successfully pass the new dataframe back to SPSS and everything's fine. But in the same call to R, I would like to create a new variable

indexave <- c("INDX_Ave","label",0,"F8.2","scale")

which indexes INDX to the average of itself using the code below:

casedata <- data.frame(casedata, casedata$INDX/mean(casedata$INDX))

I cannot figure out how to pass INDX_Ave back to SPSS.

I suspect that is has to do with the way SPSS assigns names to new variables. You'll notice that

ave(casedata$B, casedata$Patient_Type, FUN = function(x) (x- min(x))/(max(x) - min(x))

doesn't have casedata$INDX= in front of it. SPSS apparently knows from this line of code

index <- c("INDX","label",0,"F8.2","scale")

to pass the name INDX to the first variable created. I believe this disjointedness of the variable name from the variable itself is preventing the additional variable INDX_Ave from being created.

Below is my entire program block:

BEGIN PROGRAM  R.
dict <- spssdictionary.GetDictionaryFromSPSS()
casedata <- spssdata.GetDataFromSPSS(factorMode="labels")
catdict <- spssdictionary.GetCategoricalDictionaryFromSPSS()
index <- c("INDX","Level Importance Index",0,"F8.2","scale")
indexave <- c("INDX_Ave","Level importance indexed to average importance",0,"F8.2","scale")
dict<-data.frame(dict,index,indexave) 

casedata <- data.frame(casedata, ave(casedata$B, casedata$Patient_Type, 
                                    FUN = function(x) (x- min(x))/(max(x)- min(x))))

casedata <- data.frame(casedata, casedata$INDX/mean(casedata$INDX)) #dosent work

spssdictionary.SetDictionaryToSPSS("BWOverallBetas2",dict,categoryDictionary=catdict)
spssdata.SetDataToSPSS("BWOverallBetas2",casedata,categoryDictionary=catdict)
spssdictionary.EndDataStep()
END PROGRAM.

Solution

  • See the section "Writing Results to a New IBM SPSS Statistics Dataset" in the R Programmability doc. The names in the dictionary you pass govern the names on the SPSS side, but note that the rules for legal variable names in SPSS and R are different, although that isn't an issue here. Also, you can't create a dataset if SPSS is in procedure state (also not an issue with this code).

    Your code adds INDX to the SPSS dictionary and computes it via ave but does not assign the name INDX in the casedata data frame. Then it adds another variable but does not add that to the dictionary to be sent to SPSS, so the sizes of the dictionary and the data frames don't match.

    Note also that you can omit the factorMode argument in GetDataFromSPSS and then not bother with the categorical dictionary, because the values will be unchanged.

    HTH