Search code examples
loopsmacrosspss

Macro to open, recode and stack several .csv files in SPSS


I am trying to code a macro that:

  1. Import the columns year, month, id, value and motive from several .csv sequential files to SPSS. The files are named like: DATA_JAN_2010, DATA_FEB_2010 [...], until DATA_DEC_2019. These are the first variables of the csv files (the code I am using to import this variables is provided in the end).

  2. Alter type of columns id to (a11), motive to (a32), if necessary (needed to stack all files).

  3. Stack all these datasets in a new dataset named: DATA_2010_2019.

For now, what I am doing is to import each file separately, stacking and saving two by two. But this is so repetitive and irrational from the efficiency standpoint. Moreover, if in the future I need to import additional variables, I would need to rewrite all the code for each file. That is why I believe that a loop or a macro would be the smartest way of dealing with this repetitive codes. Any help is really appreciated.

A sample of my code so far:

GET DATA  /TYPE=TXT
  /FILE="C:\Users\luizz\DATA\DATA_JAN_2010.csv"
  /ENCODING='Locale'
  /DELCASE=LINE
  /DELIMITERS=";"
  /ARRANGEMENT=DELIMITED
  /FIRSTCASE=2
  /IMPORTCASE=ALL
  /VARIABLES=
  YEAR F4.0
  MONTH F1.0
  ID A11
  VALUE F4.0
  MOTIVE A8.
CACHE.
EXECUTE.
DATASET NAME JAN_2010 WINDOW=FRONT.

ALTER TYPE MOTIVE (a32).

GET DATA  /TYPE=TXT
  /FILE="C:\Users\luizz\DATA\DATA_FEB_2010.csv"
  /ENCODING='Locale'
  /DELCASE=LINE
  /DELIMITERS=";"
  /ARRANGEMENT=DELIMITED
  /FIRSTCASE=2
  /IMPORTCASE=ALL
  /VARIABLES=
  YEAR F4.0
  MONTH F1.0
  ID A11
  VALUE F4.0
  MOTIVE A8.
CACHE.
EXECUTE.
DATASET NAME FEB_2010 WINDOW=FRONT.

DATASET ACTIVATE FEB_2010.
ALTER TYPE MOTIVE (a32).

DATASET ACTIVATE JAN_2010.
ADD FILES /FILE=*
  /FILE='FEB_2010'.
EXECUTE.

SAVE OUTFILE='C:\Users\luizz\DATA\DATA_JAN_FEV_2010.sav'
  /COMPRESSED.

Solution

  • Assuming the parameters for all the files are the same, you can use a macro like this:

    define !getfiles ()
    !do !yr=2010 !to 2019
    !do !mn !in("JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC")
    GET DATA 
      /TYPE=TXT /FILE=!concat('"C:\Users\luizz\DATA\DATA_', !mn, '_', !yr, '.csv"')
      /ENCODING='Locale'   /DELCASE=LINE   /DELIMITERS=";"   /ARRANGEMENT=DELIMITED
      /FIRSTCASE=2   /IMPORTCASE=ALL   /VARIABLES= 
      YEAR F4.0
      MONTH F1.0
      ID A11
      VALUE F4.0
      MOTIVE A8.
    CACHE.
    EXECUTE.
    ALTER TYPE id (a11) MOTIVE (a32).
    dataset name tmp.
    dataset activate gen.
    add files /file=* /file=tmp.
    exe.
    !doend !doend
    !enddefine.
    

    The macro as defined will read each of the files and add it to a main file. Before we call the macro we will create the main file:

    data list list/YEAR (F4)  MONTH (F1) ID (A11) VALUE (F4) MOTIVE (A8).
    begin data
    end data.
    exe.
    dataset name gen.
    * now we can call the macro.
    !getfiles .
    * now the data is all combined and we can save it.
    SAVE OUTFILE='C:\Users\luizz\DATA\DATA_JAN_FEV_2010.sav'  /COMPRESSED.
    

    NOTE: I used your code from the original post in the macro. Please make sure all the definitions are right.