Search code examples
pythonspss

Python module savReaderWriter causing Segmentation fault


I am using Python 2.7 on Ubuntu. I have a script that writes an SPSS .sav file. If I use ValueLabels with numbers as keys like this:

{1: 'yes', 2: 'no'}

the following line causes a Segmentation fault:

with savReaderWriter.SavWriter(sav_file_name, varNames, varTypes, valueLabels=value_labels, ioUtf8=True) as writer:

However, if my keys are strings like this:

{'1': 'yes', '2': 'no'}

I do not get the Segmentation fault, and my script runs fine. The problem, of course is that I need the keys to be numbers. How can I fix or work around this.

Thank you in advance.

-RLS


Solution

  • Depending on whether you specify a numerical (varType == 0) or a string (varType > 0, where varType is the length in bytes of the string value), one the following two C functions of the SPSS I/O library is called:

    • int spssSetVarNValueLabel(int handle, const char * varName, double value, const char * label)
    • int spssSetVarCValueLabel(int handle, const char * varName, const char * value, const char * label)

    Note that ctypes.c_double accepts both floats and ints, so the values of numerical variables do not necessarily have to be specified as floats (doubles), they can also be ints.

    It appears that you specified a varType > 1 (indicating a string variable), but a 'value label' value which is an int (suggesting a numerical variable). The fix is to make the two consistent. One way is already stated above, the other way is to set the varType for the variable in question to zero.

    That said, it is ugly to get this segfault. I put it on my to-do list to specify the argtype attribute for all the setter functions (see 15.17.1.6 on https://docs.python.org/2/library/ctypes.html), so you would get a nice, understandable ArgumentError instead of this nasty segfault.

    If the problem persists, could you please open an issue at https://bitbucket.org/fomcl/savreaderwriter/issues?status=new&status=open, please with a minimal example.

    @ekhumoro: savReaderWriter has not been tested for Python 2.6 or earlier (I would be surprised it if works), so a dict comprehension should be fine.

    UPDATE: @ RLS: You are welcome. Thank you too, it inspired me to correct this. As of commit 5c11704 this is now throwing a ctypes.ArgumentError (see https://bitbucket.org/fomcl/savreaderwriter). Here is an example that I might also use to write a unittest for this (the b" prefixes are needed for Python 3):

    import savReaderWriter as rw, tempfile, os, pprint
    
    savFileName = os.path.join(tempfile.gettempdir(), "some_file.sav")
    varNames = [b"a_string", b"a_numeric"]
    varTypes = {b"a_string": 1, b"a_numeric": 0}
    records = [[b"x", 1], [b"y", 777], [b"z", 10 ** 6]]
    
    # Incorrect, but now raises ctypes.ArgumentError:
    valueLabels = {b"a_numeric": {b"1": b"male", b"2": b"female"},
                   b"a_string": {1: b"male", 2: b"female"}}
    
    # Correct
    #valueLabels = {b"a_numeric": {1: b"male", 2: b"female"},
    #               b"a_string": {b"1": b"male", b"2": b"female"}}
    
    kwargs = dict(savFileName=savFileName, varNames=varNames, 
                  varTypes=varTypes, valueLabels=valueLabels)
    with rw.SavWriter(**kwargs) as writer:
        writer.writerows(records)
    
    # Check if the valueLabels look all right
    with rw.SavHeaderReader(savFileName) as header:
        metadata = header.dataDictionary(True)
        pprint.pprint(metadata.valueLabels)