Search code examples
rcsvdplyrqualtricsreadr

Can't `filter()` a numeric column even though `r` says it is a number (`typeof()` is "double")


I am new to r but I have looked around and tried everything I can think of. Here is the step by step:

  • Pulled data from Qualtrics into csv
  • Used read_csv() (much harder with read.csv()) to remove all (3 in this case) header rows and saved one to names()
  • Pulled in data with read_csv and assigned header with names()
  • Filtering Duration (in seconds) based upon its numerical value does not work. That is, `filter('Duration (in seconds)' == 0) yields a dataframe with no observations.

I have:

  • Successfully filtered other numerical columns
  • Verified that typeof(test$'Duration (in seconds)') is "double"
  • Verified that read_csv() imports 'Duration (in seconds)' as double (i.e., 'Duration (in seconds)' = col_double())

Sample code

df_names <- read_csv("file.csv", n_max=0) %>% names()
test <- read_csv("file.csv", skip=3, col_names=df_names, trim_ws = T)
test2 <- test %>% filter('Duration (in seconds)' == 0) #no rows but should be 6
test2 <- test %>% filter('Duration (in seconds)' > 0) #all rows but should be 3

Data: file.csv


Solution

  • Try replacing your quotes with backticks when referencing your variable name:

    test2 <- test %>% filter(`Duration (in seconds)` == 0) #no rows but should be 6
    test2 <- test %>% filter(`Duration (in seconds)` > 0) #all rows but should be 3
    

    Explanation: quotation marks denote strings in R; since your column is not a string, your original filter command doesn't select your desired column, and therefore won't return any rows in your filtered dataframe.

    Backticks have a few uses in R, but one of them is to give you a way of referring to names that are otherwise non-syntactic. We need to use backticks in this example because your column name has spaces in it. If we didn't, R would assume that each word in Duration (in seconds) was a separate object, which would be non-syntactic and throw an error.