rparsingcsvsurveymonkey

Using R to parse out Surveymonkey csv files


I'm trying to analyse a large survey created with surveymonkey which has hundreds of columns in the CSV file and the output format is difficult to use as the headers run over two lines.

  • Has anybody found a simple way of managing the headers in the CSV file so that the analysis is manageable ?
  • How do other people analyse results from Surveymonkey?

Thanks!


Solution

  • What I did in the end was print out the headers using libreoffice labeled as V1,V2, etc. then I just read in the file as

     m1 <- read.csv('Sheet1.csv', header=FALSE, skip=1)
    

    and then just did the analysis against m1$V10, m1$V23 etc...

    To get around the mess of multiple columns I used the following little function

    # function to merge columns into one with a space separator and then
    # remove multiple spaces
    mcols <- function(df, cols) {
        # e.g. mcols(df, c(14:18))
            exp <- paste('df[,', cols, ']', sep='', collapse=',' )
            # this creates something like...
            # "df[,14],df[,15],df[,16],df[,17],df[,18]"
            # now we just want to do a paste of this expression...
            nexp <- paste(" paste(", exp, ", sep=' ')")
            # so now nexp looks something like...
            # " paste( df[,14],df[,15],df[,16],df[,17],df[,18] , sep='')"
            # now we just need to parse this text... and eval() it...
            newcol <- eval(parse(text=nexp))
            newcol <- gsub('  *', ' ', newcol) # replace duplicate spaces by a single one
            newcol <- gsub('^ *', '', newcol) # remove leading spaces
            gsub(' *$', '', newcol) # remove trailing spaces
    }
    # mcols(df, c(14:18))
    

    No doubt somebody will be able to clean this up!

    To tidy up Likert-like scales I used:

    # function to tidy c('Strongly Agree', 'Agree', 'Disagree', 'Strongly Disagree')
    tidylik4 <- function(x) {
      xlevels <- c('Strongly Disagree', 'Disagree', 'Agree', 'Strongly Agree')
      y <- ifelse(x == '', NA, x)
      ordered(y, levels=xlevels)
    }
    
    for (i in 44:52) {
      m2[,i] <- tidylik4(m2[,i])
    }
    

    Feel free to comment as no doubt this will come up again!