The dataset contains 6 variables such as:
StudentID Sequence Metric Score Interpretation ScoreName
a123 8.00 13 540 12 P1_AIE
a123 9.00 14 550 19 P2_AIE
a123 9.00 15 500 13 P1_BP
a124 8.00 14 450 11 P2_AIE
I am ultimately trying to put all cases in one row such as:
StudentID P1_AIE_Seq P1_AIE_Scr P1_AIE_Interp P2_AIE_Seq P2_AIE_Scr P2_AIE_Interp P1_BP_Seq etc.
a123 8.00 540 12 9.00 550 19 9.00 etc.
a124 8.00 450 11
These are students (StudentID) who repeatedly complete a battery of tests (Sequence); their results (Score) are tracked by a numeric value (Metric) and a string (ScoreName), and a numeric code is attached for the interpretation for each score (Interpretation).
Metric and ScoreName always correspond (i.e., 14 = P2_AIE in all cases). There are no duplicate records. Sequence isn't initially appropriate to use as an index because it repeats within the ID variable, but to get around that I use:
SORT CASES BY StudentID Sequence.
compute dup = 0.
if $casenum>1 and lag(StudentID)= StudentID and lag(sequence) = Sequence dup=lag(dup)+1.
sort cases by StudentID dup Sequence.
CASESTOVARS
/ID = StudentID dup
/Index = Sequence
/GROUPBY = Index/sep="_".
I am obviously constructing the syntax incorrectly---this doesn't put a single case on a row, it won't rename variables by the ScoreName---not even sure that SPSS can do this without python.
Any feedback or advice would be greatly appreciated
In your example of the desired output it looks like you are using scorename
as the index, in which case there doesn't seem to be any need to fiddle with the sequence
variable as you did.
This recreates your example data:
data list list /StudentID (a4) Sequence Metric Score Interpretation (4f3) ScoreName (a6).
begin data.
"a123" 8 13 540 12 "P1_AIE"
"a123" 9 14 550 19 "P2_AIE"
"a123" 9 15 500 13 "P1_BP"
"a124" 8 14 450 11 "P2_AIE"
end data.
Now if you run the following, you get all the data for each ID in one line:
sort cases by studentID ScoreName.
casestovars /id=studentID /index=ScoreName/sep="_"/groupby=index.