Search code examples
pythonmergespss

Merge SPSS variables if they don't exist in the original file, using Python


I have an SPSS file that I am removing unwanted variables from, but want to bring in variables from elsewhere if they don't exist. So, I am looking for some Python code to go into my syntax to say - keep all the variables from a list and if any of these don't exist in the first file, then merge them in from the second file. (Python rookie here..)

Thanks!


Solution

  • Here's an apporach to get you started:

    DATA LIST FREE / ID A B C D E.
    BEGIN DATA
    1 11 12 13 14 15
    END DATA.
    DATASET NAME DS1.
    
    DATA LIST FREE /  ID D E F G H.
    BEGIN DATA
    1 24 25 26 27 28
    END DATA.
    DATASET NAME DS2.
    
    BEGIN PROGRAM PYTHON.
    import spssaux, spss
    spss.Submit("dataset activate ds1.")
    ds1vars=[v.VariableName for v in spssaux.VariableDict()]
    spss.Submit("dataset activate ds2.")
    ds2vars=[v.VariableName for v in spssaux.VariableDict()]
    
    extravars = [v for v in ds2vars if v not in ds1vars]
    
    spss.Submit("""
    
    DATASET ACTIVATE DS2.
    ADD FILES FILE=* /KEEP=ID %s.
    MATCH FILES FILE=DS1 /TABLE DS2 /BY ID.
    DATASET NAME DS3.
    DATASET ACTIVATE DS3.
    
    """ % (" ".join(extravars) ) )
    
    END PROGRAM PYTHON.