Search code examples
google-sheetsfrequencygoogle-formsspss

How to handle (Google Forms - Spreadsheet) "Checkboxes" answers in SPSS


I am analyzing an electronic survey I made using Google Forms and I have the following problem.

One of the questions can take multiple answers in the form of Checkboxes as shown in the picture below. The question is in Greek so I have added some Choice1, Choice2, Choice3 etc next to each answer in order to facilitate my question.

enter image description here

In my data when someone chose lets say Choice1 and Choice2, I will have an answer which is the concatenation of the strings he checked seperated with commas.

In this case it would be:

Choice1, Choice2

If someone else checked Choice1, Choice2 and Choice4 his answer in my data would be:

Choice1, Choice2, Choice4

The problem is SPSS has no way of seperating the substrings (seperated by commas) and understanding which Choices each case has in common. Or maybe there is a way but I don't know it :)

When I, for example, do a simple frequency analysis for this question it produces a table that perceives

Choice1, Choice2

as a completely different case from

Choice1, Choice2, Choice4

Ideally I would like to somehow tell SPSS to count the frequency of each unique Choice (Choice1, Choice2, Choice3 etc etc) rather than each unique combination of those Choices. Is that possible? And if it is can you point me to the documentation I need to study to make it happen?

Thx a lot!


Solution

  • Imagine you are working with the following data, which is a CSV file you have downloaded from your online form. Copy and paste the text below and save it to a text file named "CourseInterestSurvey.CSV".

    Timestamp,Which courses are you interested in?,What software do you use?
    12/28/2012 11:57:56,"Research Methods, Data Visualization","Gnumeric, SPSS, R"
    12/28/2012 11:58:09,Data Visualization,"SPSS, Stata, R"
    12/28/2012 11:59:09,"Research Dissemination, Graphic Design",Adobe InDesign
    12/28/2012 11:59:27,"Data Analysis, Data Visualization, Graphic Design","Excel, OpenOffice.org/Libre Office, Stata"
    12/28/2012 11:59:44,Data Visualization,"R, Adobe Illustrator"
    

    Read it into SPSS using the following syntax:

    GET DATA
      /TYPE=TXT
      /FILE="path\to\CourseInterestSurvey.CSV"
      /DELCASE=LINE
      /DELIMITERS=","
      /QUALIFIER='"'
      /ARRANGEMENT=DELIMITED
      /FIRSTCASE=2
      /IMPORTCASE=ALL
      /VARIABLES=
      Timestamp A19
      CourseInterest A49
      Software A41.
    CACHE.
    EXECUTE.
    DATASET NAME DataSet2 WINDOW=FRONT.
    LIST.
    

    It currently looks like the image below--three columns (one timestamp, and two with the data we want):

    enter image description here

    Working with some syntax from here, we can split the cells up as follows:

    * We know the string does not excede 50 characters.
    * We got that information while we were reading our data in.
    STRING #temp(a50). 
    * We're going to work on the "CourseInterest" variable.
    COMPUTE #temp=CourseInterest. 
    * We're going to create 3 new variables with the prefix "CourseInterest".
    * You should modify this according to the actual number of options your data has
    * and the maximum length of one of the strings in your data.
    VECTOR CourseInterest(3, a25). 
    * Here's where the actual variable creation takes place.
    LOOP #i = 1 TO 3. 
    .  COMPUTE #index=index(#temp,","). 
    .  DO IF   #index GT 0. 
    .    COMPUTE CourseInterest(#i)=LTRIM(substr(#temp,1, #index-1)). 
    .    COMPUTE #temp=substr(#temp, #index+1). 
    .  ELSE. 
    .    COMPUTE CourseInterest(#i)=LTRIM(#temp). 
    .    COMPUTE #temp=''. 
    .  END IF. 
    END LOOP IF #index EQ 0. 
    LIST.
    

    The result:

    enter image description here

    This only addresses one column at a time, and I'm not familiar enough to modify it to work over multiple columns. However, if you were to switch over to R, I already have some readymade functions to help deal with exactly these kinds of situations.