Search code examples
pythonpandasdataframespss

Converting from SPSS to Pandas...result gives "b'var_name'" for all variables


enter image description hereI'm trying to convert an SPSS file to Pandas, which is working fine. However, all variables present as "b'variable_name'". It puts a 'b' in front of each variable and single quotes around the original variable name. Is there a way to do this and keep the original variable name?

I've tried to rename the variables, but the quotations throw off the code...and besides...there are a lot of variables, so this is tedious and not ideal.

df = pd.DataFrame(list(s.SavReader(r'C:\Users\Nick\Desktop\GitProjects\Data\M2.sav', returnHeader=True, 
                                   recodeSysmisTo='NaN',ioUtf8=True,rawMode=True)))
df.head(10)

# Create a new variable called 'header' from the first row of the dataset
header = df.iloc[0]
# Replace the dataframe with a new one which does not contain the first row
df = df[1:]
# Rename the dataframe's column values with the header variable
M2 = df.rename(columns = header)
M2.head(10)

Here is the resulting dateframe. It's fine, but I need to get rid of the 'b' and the single quotes around each variable.


Solution

  • For a quick fix, to that :

    header = list(map(str, df.iloc[0])) 
    

    So the b'' mean that all your header name are byte, not string. It's maybe du to the function used to read. Sav filw