I've got an SPSS dataframe looking like this:
K_161 K_161_A K_161_B K_161_B K_161_C K_161_D K_213
0 0 0 0 0 1 0
1 0 0 0 1 1 1
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 1 0 0 0 0 0
1 1 1 1 1 1 1
0 0 0 1 1 1 1
0 0 0 0 1 1 1
0 1 1 1 1 1 1
1 0 0 0 0 1 0
0 0 0 0 0 0 0
1 1 0 1 1 1 1
0 1 0 0 0 0 0
1 0 0 0 0 0 0
0 0 0 0 1 0 0
1 1 0 1 0 1 1
0 1 0 0 0 1 0
0 1 0 0 0 0 0
0 1 0 1 1 0 0
Basically I want to loop through the variables (columns) and return values for the reliability measure Cronbach's alpha. Something that would work along the lines of the following pseudocode in Python:
BEGIN PROGRAM.
import spss
unique_ids = open("E:/company_id.txt", "r")
for i in unique_ids:
i = "K_" + str(i)
spss.Submit(r'''
RELIABILITY
/VARIABLES= + i
/SCALE('ALL VARIABLES') ALL
/MODEL=ALPHA.
''')
END PROGRAM.
I've got about 9000 columns. The variables used in each loop can be grouped based on their unique number: e.g. for 161 the loop would include K_161 K_161_A K_161_B K_161_B K161_C K_161_D and so on. I've got a list of these unique numbers (textfile) looking like this:
161
213
263
284
295
297
306
347
The SPSS syntax returns a measure for Cronbach's alpha as well as the number of cases. I want to return a dataframe which looks like this:
id cronbach cases
161 0.651 5
213 0.856 3
263 0.752 6
284 0.927 22
295 0.884 24
297 0.934 31
306 0.902 9
347 1
398 0.405 4
457 0.884 12
491 0.137 2
507 0.909 28
571 1
590 0.738 9
I kind of lack the creativity to find the right solution here. Really looking for some ideas how you would approach the problem. Thank you very much!
First I'm creating two datasets to simulate your two files - one with your IDs and one with the actual data for analysis:
data list free/id (f3).
begin data
161 444 213
end data.
dataset name numlist.
data list free/K_161 K_161_A K_161_B K_213_B K_161_C K_161_D K_213 K_213_A K_444 K_444_F K_444_K (11f3).
begin data
3 3 3 4 4 4 5 5 1 2 3 4 5 6 7 8 9 0 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7
end data.
dataset name main.
Now the Idea is to use the ID list to write a new syntax file.
For each item in the list, the new syntax will:
1. define a macro that finds all the corresponding variable names in the main file
2. run a reliability analysis on them.
The OMS command will capture the cronbach data from the output into another dataset:
DATASET DECLARE MyAlphas.
OMS/SELECT TABLES/IF COMMANDS=['Reliability'] SUBTYPES=['Reliability Statistics']
/DESTINATION FORMAT=SAV NUMBERED=TableNumber_ OUTFILE='MyAlphas' .
dataset activate numlist.
write out="path\temp macro run.sps"
/'SPSSINC SELECT VARIABLES MACRONAME = "!a', id, '" /PROPERTIES PATTERN=".*', id, '*".'
/'RELIABILITY /VARIABLES= !a', id, ' /SCALE("ALL VARIABLES") ALL /MODEL=ALPHA.'.
dataset activate main.
insert file="path\temp macro run.sps".
omsend.
At this point you have a dataset called MyAlphas
which has the cronbach values for each reliability test that was run. It doesn't yet have the IDs in it, but it's easy enough to match files
this dataset with the IDs list -
you can add an index number to the list and match it with TableNumber_
in MyAlphas
.