Search code examples
syntaxmaxspss

SPSS - How to isolate the maximum value in a variable so you can use it in syntax?


I have a table of areas with data. For a particular operation, I want to exclude the top and bottom 1% of areas, as they include extreme outliers.

Seems to me that a way forward is:

SORT CASES BY theVariableIwantToAnalyse  (A) .
NUMERIC id (F12.0) .  * create a casenum label "id"
COMPUTE id = $CASENUM. * populate it with casenum
EXECUTE.
NUMERIC idmax (F12.4) .   * create a variable to contain the highest value for "id" 
NUMERIC id1perc (F12.4) . * create a variable to contain 1% of the highest value for "id"  
COMPUTE idmax = MAX(id) .    * determine the highest value for id. This 'mock-syntax' line does not work.   
COMPUTE id1perc = idmax / 100 . * 1% of the highest value for "id"  
SELECT CASES WHERE ID >= id1perc or ID <= idmax - id1perc .

Draw graphs etc. I then need to

SORT CASES BY theNextVariableIwantToAnalyse  (A) .
COMPUTE id = $CASENUM. * populate it with the NEW casenum order
EXECUTE.

etc ...


Solution

  • Try this to simply filter out the top and bottom 1% - just add FILTER BY filter. to turn off all extreme cases, or SELECT IF filter. ... EXECUTE. to delete them

    EDIT: note that repeated values will be condensed by the RANK method (specifically the /TIESoption). This might not be ideal if you have the possibility of repeated values. Change the /TIES option if that's the case.

    ************* GENERATE RANDOM DATA *****************.
    INPUT PROGRAM.
    -       LOOP #I = 1 TO 1000.
    -             COMPUTE Y = RV.NORMAL(100,10).
    -           END CASE.
    -       END LOOP.
    -       END FILE.
    END INPUT PROGRAM.
    
    dataset name exampleData WINDOW=front.
    EXECUTE.
    
    
    ************* RANK DATA  *************.
    DATASET ACTIVATE exampleData.
    RANK VARIABLES=Y (A)
      /RFRACTION INTO fractile
      /TIES=CONDENSE.
    
    ************* MAKE A FILTER  *************.
    COMPUTE filter = (fractile>0.01 AND fractile < 0.99).
    EXECUTE.
    
    * Chart Builder.
    GGRAPH
      /GRAPHDATASET NAME="graphdataset" VARIABLES=Y filter MISSING=LISTWISE REPORTMISSING=NO
      /GRAPHSPEC SOURCE=INLINE.
    BEGIN GPL
      SOURCE: s=userSource(id("graphdataset"))
      DATA: Y=col(source(s), name("Y"))
      DATA: filter=col(source(s), name("filter"), unit.category())
      GUIDE: axis(dim(1), label("Y"))
      GUIDE: axis(dim(2), label("Frequency"))
      GUIDE: legend(aesthetic(aesthetic.color.interior), label("filter"))
      ELEMENT: interval.stack(position(summary.count(bin.rect(Y))), color.interior(filter), 
        shape.interior(shape.square))
    END GPL.