Search code examples
spss

Dynamically calling input variables for classification and regression tasks


I have a large number of files which have the same target variables but a large numbers of input variables which vary from file to file. I would like to conduct classification and regression analysis on a new file without explicitly listing the input variables each time.

I am able to define a list of input variables within spss using spssinc select variables by matching a regular expression within the variables names. For most tasks I would then run a loop using a macro so I do not need to explicitly list the variables. This however is not appropriate when conducting many classification and regression tasks as I am only looking at running the analysis once for a single target variable, and just need to define the list of input variables.

Below is an example dataset (much smaller than the datasets I am working with).

data list list/ID (A3) Sex (A1) Age (F2.0) Education (A5) Test_price01 Test_new01 Test_income01 Test_exp01 Test_01 Test_house01 Test_car01 Test_boat01 Test_var01 Test_var02 .
begin data
    ID1 M 20 Prim 1 2 3 4 5 6 7 8 9 9
    ID2 F 22 High 5 4 3 6 3 8 1 2 5 8
    ID3 M 30 High 0 8 6 4 2 1 3 5 7 9
end data.
dataset name survey.

I would like to run a discriminant analysis which I could manually using the code below:

DATASET ACTIVATE survey.
DISCRIMINANT
  /GROUPS=Age(20 30)
  /VARIABLES=Test_price01 Test_new01 Test_income01 Test_exp01 Test_01 Test_house01 Test_car01 
    Test_boat01 Test_var01 Test_var02
  /ANALYSIS ALL
  /PRIORS EQUAL 
  /CLASSIFY=NONMISSING POOLED MEANSUB.

I have been able to define the input variables using spssinc select variables, using the regular expression 'Test_'

spssinc select variables macroname="!Test_Vars" /properties pattern=".*Test_".

It would be great if I could somehoe use this list (or another approach) to dynamically updating my input variables for classification and regression tasks.


Solution

  • That is exactly what you use the macro name from spssinc select variables for - you put it in the syntax instead of a list of variables.
    So in your syntax it should look like this:

    DISCRIMINANT
      /GROUPS=Age(20 30)
      /VARIABLES= !Test_Vars
      /ANALYSIS ALL
      /PRIORS EQUAL 
      /CLASSIFY=NONMISSING POOLED MEANSUB.