Search code examples
pythonspss

Mapping Python for loops through an SPSS regression


I need to run two loops through my regression, one of them being the independent variable and the other is a suffix for the prediction I need to save with each round of independent variables. I can do either of these loops separately and it works fine but not when I combine them in the same regression. I think this has something to do with the loop mapping at the end of my regression after the %. I get the error code "TypeError: list indices must be integers, not str." But, that is because my Dependent variables are read as strings to get the values from SPSS data frame. Any way to map a for loop in a regression that includes string variables?

I have tried using the map() function, but I got the code that the iteration is not supported.

begin program.
import spss,spssaux
dependent = ['dv1', 'dv2', 'dv3', 'dv4', 'dv5']
spssSyntax = ''
depList = spssaux.VariableDict(caseless = True).expand(dependent)
varSuffix = [1,2,3,4,5]


for dep in depList:
    for var in varSuffix:
        spssSyntax += '''
    REGRESSION 
      /MISSING LISTWISE 
      /STATISTICS COEFF OUTS R
      /CRITERIA=PIN(.05) POUT(.10) 
      /NOORIGIN 
      /DEPENDENT %(dep)s 
      /METHOD=FORWARD  iv1 iv2 iv3
      /SAVE PRED(PRE_%(var)d).
    '''%(depList[dep],varSuffix[var])
end program. 

I get the error code 'TypeError: list indices must be integers, not str' with the code above. How do I map the loop while also including a string?


Solution

  • In Python, when you loop directly through an iterable, the loop variable becomes the current value so there is no need to index original lists with depList[dep] and varSuffix[var] but use variables directly: dep and var.

    Additionally, consider str.format for string interpolation which is the Python 3 preferred method rather than the outmoded, de-emphasized (not yet deprecated) string modulo % operator:

    for dep in depList:
        for var in varSuffix:
            spssSyntax += '''REGRESSION 
                               /MISSING LISTWISE 
                               /STATISTICS COEFF OUTS R
                               /CRITERIA=PIN(.05) POUT(.10) 
                               /NOORIGIN 
                               /DEPENDENT {0} 
                               /METHOD=FORWARD  iv1 iv2 iv3
                               /SAVE PRED(PRE_{1})
                         '''.format(dep, var)
    

    Alternatively, consider combining the two lists for one loop using itertools.product, then use a list comprehension to build string with join instead of concatenating loop iterations with +=:

    from itertools import product
    import spss,spssaux
    
    dependent = ['dv1', 'dv2', 'dv3', 'dv4', 'dv5']    
    depList = spssaux.VariableDict(caseless = True).expand(dependent)
    varSuffix = [1,2,3,4,5]
    
    base_string = '''REGRESSION 
                       /MISSING LISTWISE 
                       /STATISTICS COEFF OUTS R
                       /CRITERIA=PIN(.05) POUT(.10) 
                       /NOORIGIN 
                       /DEPENDENT {0} 
                       /METHOD=FORWARD  iv1 iv2 iv3
                       /SAVE PRED(PRE_{1})
                  '''
    
    # LIST COMPREHENSION UNPACKING TUPLES TO FORMAT BASE STRING
    # JOIN RESULTING LIST WITH LINE BREAKS SEPARATING ITEMS
    spssSyntax = "\n".join([base_string.format(*dep_var) 
                               for dep_var in product(depList, varSuffix)])
    

    Now if you need to iterate in parallel elementwise between the equal length lists consider zip instead of product:

    spssSyntax = "\n".join([base_string.format(d,v) 
                               for d,v in zip(depList, varSuffix)])
    

    Or enumerate for index number:

    spssSyntax = "\n".join([base_string.format(d,i+1) 
                               for i,d in enumerate(depList)])