Search code examples
rdplyrtidyverse

Pulling data from Qualtrics survey but maintaining row ID when extracting count data and categories in R


I have a data frame that I am getting from a survey I made in Qualtrics where we identified the species of fish harvested and number harvested for each angler (ID = individual anglers). I am trying to wrangle the data so that I can build a harvest data frame with the survey data but maintain the survey ID.

Here is what my data looks like:

df<-structure(list(ID = 1:3, Q7 = c("Other species (please list species name; i.e. Tarpon),Other species (please list species name; i.e. Amberjack)", 
"Red Drum (a.k.a. Redfish or Red),Other species (please list species name; i.e. Tarpon),Other species (please list species name; i.e. Amberjack)", 
"Red Drum (a.k.a. Redfish or Red),Other species (please list species name; i.e. Tarpon)"
), Q7_7_TEXT = c("Tink", "Blue", "Blue"), Q7_8_TEXT = c("Chii", 
"Red", NA), Q8_1 = c(NA, "2", "7"), Q8_7 = c("1", "4", "9"), 
    Q8_8 = c("2", "5", NA)), class = "data.frame", row.names = 3:5)

I am using the following code to pull out the species and harvest numbers for each species so that the information is housed in a data frame (harv) with species and counts:

library(tidyverse)
library(stringi)
harv<-data.frame(spp=unlist(strsplit(df$Q7, ",")))
oth<-na.omit(data.frame(spp=stri_remove_empty(c(unlist(t(df[,3:4]))))))

idx <- grep('Other', harv$spp)
n <- min(length(idx), nrow(oth))
harv[idx, 'spp'] <- oth

harv.num<-na.omit(as.numeric(stri_remove_empty(c(unlist(t(df[,5:7]))))))

harv$num<-harv.num

harv
                               spp num
1                             Tink   1
2                             Chii   2
3 Red Drum (a.k.a. Redfish or Red)   2
4                             Blue   4
5                              Red   5
6 Red Drum (a.k.a. Redfish or Red)   7
7                             Blue   9

How can extract this data from df while also maintaining the survey ID from the original data frame so that harv looks like this:

                               spp num ID
1                             Tink   1  1
2                             Chii   2  1
3 Red Drum (a.k.a. Redfish or Red)   2  2
4                             Blue   4  2
5                              Red   5  2
6 Red Drum (a.k.a. Redfish or Red)   7  3
7                             Blue   9  3

Solution

  • You can achieve your desired result with pivot_longer and some preliminary renaming of the column names like so:

    library(dplyr, warn = FALSE)
    library(tidyr)
    
    df |> 
      rename(Q7_1_TEXT = Q7) |> 
      rename_with(~gsub("_TEXT$", "", .x), ends_with("TEXT")) |> 
      pivot_longer(-ID, names_to = c(".value", "name"), names_sep = "_") |> 
      filter(!is.na(Q8)) |> 
      select(ID, spp = Q7, num = Q8)
    #> # A tibble: 7 × 3
    #>      ID spp                                                                num  
    #>   <int> <chr>                                                              <chr>
    #> 1     1 Tink                                                               1    
    #> 2     1 Chii                                                               2    
    #> 3     2 Red Drum (a.k.a. Redfish or Red),Other species (please list speci… 2    
    #> 4     2 Blue                                                               4    
    #> 5     2 Red                                                                5    
    #> 6     3 Red Drum (a.k.a. Redfish or Red),Other species (please list speci… 7    
    #> 7     3 Blue                                                               9