Search code examples
loopsvectorrepeatspsslag

lag over columns/ variables SPSS


I want to do something I thought was really simple. My (mock) data looks like this:

enter image description here

data list free/totalscore.1 to totalscore.5.
begin data.
1 2 6 7 10 1 4 9 11 12 0 2 4 6 9   
end data.

These are total scores accumulating over a number of trials (in this mock data, from 1 to 5). Now I want to know the number of scores earned in each trial. In other words, I want to subtract the value in the n trial from the n+1 trial. The most simple syntax would look like this:

COMPUTE trialscore.1 = totalscore.2 - totalscore.1.
EXECUTE. 

COMPUTE trialscore.2 = totalscore.3 - totalscore.2.
EXECUTE. 

COMPUTE trialscore.3 = totalscore.4 - totalscore.3.
EXECUTE. 

And so on... So that the result would look like this:

enter image description here

But of course it is not possible and not fun to do this for 200+ variables. I attempted to write a syntax using VECTOR and DO REPEAT as follows:

COMPUTE #y = 1.
VECTOR totalscore = totalscore.1 to totalscore.5. 
DO REPEAT trialscore = trialscore.1 to trialscore.5.
COMPUTE #y = #x + 1. 
END REPEAT. 
COMPUTE trialscore(#i) = totalscore(#y) - totalscore(#i). 
EXECUTE.

But it doesn't work. Any help is appreciated.

Ps. I've looked into using LAG but that loops over rows while I need it to go over 1 column at a time.


Solution

  • I am assuming respid is your original (unique) record identifier.

    EDIT:

    If you do not have a record indentifier, you can very easily create a dummy one:

    compute respid=$casenum.
    exe.
    

    end of EDIT

    You could try re-structuring the data, so that each score is a distinct record:

    varstocases
    /make totalscore from totalscore.1 to totalscore.5
    /index=scorenumber
    /NULL=keep.
    exe.
    

    then sort your cases so that scores are in descending order (in order to be bale to use lag function):

    sort cases by respid (a) scorenumber (d).
    

    Then actually do the lag-based computations

    do if respid=lag(respid).
        compute trialscore=totalscore-lag(totalscore).
    end if.
    exe.
    

    In the end, un-do the restructuring:

    casestovars
    /id=respid
    /index=scorenumber.
    exe.
    

    You should end up with a set of totalscore variables (the last one will be empty), which will hold what you need.