Search code examples
databasesyntaxspss

Compute variable mean with case removed in SPSS...many times


I hope this is an easy question but I'm having trouble creating SPSS syntax for it.

I have a dataset with a single variable and about 200 cases. I need to compute the mean of that variable, but I need to compute the mean 200 times such that it is computed once with each case removed. So the mean needs to be computed 200 times, removing each case once (and then replacing it) and calculating the mean with that case missing. In other words, the first time I compute the mean it should exclude the first case (so cases 2 through 200 are analyzed). The second time I compute the mean it should exclude the second case but include the first case (so cases 1 and 3 through 200 are analyzed). And so on.

Ideally what I would like to do is create a new SPSS dataset, such that the only variable in this new dataset contains these 200 means. I believe the best way to do this is through the aggregate function.

What I am having trouble with is how to remove each case, compute the mean, replace the case, compute the mean again with another case removed, and so on. I could do this with a filter, but I would like to automate it rather than having to copy/past or change the syntax each time. I am thinking some kind of repeating filter, but I am not very familiar with repeat and loop commands (but working on it...).

Any insight or help about the best way to create a filter like this would be much appreciated


Solution

  • I was correct in my comment that you can levy the use of the deletion statistics available in the REGRESSION procedure to get the info you need without having to loop through the dataset yourself.

    What you have to do is calculate your own constant value of 1 and force the REGRESSION through the origin (as SPSS does not let you specify an empty regression equation) predicting your variable of interest. Then have the regression procedure save the deletion residuals. The difference between these deletion residuals and your original variable are the jackknifed means with that observation deleted.

    So in a nutshell this code would provide that info - just replace X with your variable of interest.

    COMPUTE Const = 1.
    REGRESSION
      /ORIGIN 
      /DEPENDENT X
      /METHOD=ENTER Const
      /SAVE DRESID (MeanResid).
    COMPUTE JackknifeMeanX = X - MeanResid.
    

    Full example (with fake data and checking via aggregate) is below:

    INPUT PROGRAM.
    LOOP Id = 1 TO 10.
    END CASE.
    END LOOP.
    END FILE.
    END INPUT PROGRAM.
    DATASET NAME Sim.
    COMPUTE X = RV.NORMAL(10,5).
    COMPUTE Const = 1.
    FORMATS Id Const (F2.0).
    EXECUTE.
    
    *Using deletion residuals in linear regression to calculate Jackknifed mean.
    *Here I calculate my own intercept and force through origin.
    REGRESSION
      /ORIGIN 
      /DEPENDENT X
      /METHOD=ENTER Const
      /SAVE DRESID (MeanResid).
    COMPUTE JackknifeMeanX = X - MeanResid.
    
    *Checking to make sure this agrees with data.
    VECTOR XMis(10).
    LOOP #i = 1 TO 10.
      IF $casenum <>#i XMis(#i) = X.
    END LOOP.
    AGGREGATE OUTFILE = * OVERWRITE=YES MODE=ADDVARIABLES
      /BREAK
      /XMis1 TO XMis10=MEAN(Xmis1 TO XMis10).