Search code examples
rcountmissing-datafrequency-distribution

processing survey multi-choise data in R


I need to analyse survey data to get the frequency of a multi question variable. I'm using this R package

I understand that I need to use the 'multi.split' function in order to create the variable that I will be working with. but I need to know how I can make it reference answers that are not in the data-set, meaning answers that were a part of the original question but was not selected during the survey and therefor should be displayed with the value 0.

Example: I have the following passable answers:

"red", "blue", "green" and "yellow" 

and my data is (like in the example):

v <- c("red/blue","green","red/green","blue/red")

when I run this command:

multi.table(multi.split(v))

I get the following result:

        n     %multi
v.blue  2     50
v.red   3     75
v.green 2     50

but I would like to get:

         n     %multi
v.blue   2     50
v.red    3     75
v.green  2     50
v.yellow 0      0

any ideas on how can I do that?


Solution

  • I have never used this package before but I'll give it a try.

    The function multi-split() produces a data.frame so if you want to add another column before getting the statistics you could do something like the following:

    v <- c("red/blue","green","red/green","blue/red")
    a <- multi.split(v)
    a$v.yellow <-  0
    multi.table(a)
    
    
    ## > multi.table(a)
    ## n %multi
    ## v.blue   2     50
    ## v.red    3     75
    ## v.green  2     50
    ## v.yellow 0      0
    

    Update A more generic version would go something like that.

    1.wanted.data is a char of column names that you always want in your output. 2. col.to.add are the columns that were not in the a data.frame. 3. Then assign 0 to the columns that were not present. 4. Finally order the columns so we always have them in the same order.

    library(questionr)
    v <- c("red/blue","green","red/green","blue/red")
    wanted_data <-  c("v.red","v.blue","v.green","v.yellow")
    
     a <- multi.split(v)
     col.to.add<- wanted_data[!(wanted_data%in% colnames(a) )]
     a[col.to.add] <- 0
     a[,order(colnames(a))]
     multi.table(a)
    
    ## > multi.table(a)
    ## n %multi
    ## v.blue   2     50
    ## v.red    3     75
    ## v.green  2     50
    ## v.yellow 0      0