Search code examples
pythoncsvpandasqualtrics

How do you import a Qualtrics csv file into a pandas dataframe?


Qualtrics is a fairly popular survey platform. You can download your survey data as CSV files. There are a couple of quirks about Qualtrics CSV files:

  1. The begin with the BOM character
  2. They include an extra row of information to explain what the variables are
  3. They frequently included parentheses and periods in column names.

I've been able to deal with #1 and #2 with the following code:

import pandas as pd
df = pd.read_csv('qualtrics_survey.csv', skiprows=[1], encoding='utf-8-sig')

I run the following code, I see a list of all columns, includeing parentheses and period.

list(df.columns.values)

There is a column called turk.1. However, I cannot run:

df.turk.1

I'm not sure what the best way is to load the files. I'd be fine removing all parenthesis, and replacing periods with dashes or something.


Solution

  • You can just use the df['col'] notation instead of df.col to select a column. For this reason, this notation is actually preferred.

    If you don't want that, you can also rename your columns with the rename method after reading in the data. You can do this manually:

    df = df.rename(columns={'turk.1': 'other_name'})
    

    or provide eg a function to replace all periods with underscores:

    df = df.rename(columns=lambda x: x.replace('.', '_'))