Search code examples
rregextextfilterdata-cleaning

Remove rows that do NOT contain a letter in them using R


I'm trying to filter out my data, so I only contain rows with IDs that have at least one letter (in any place). I'm stumped because I have so many rows with random characters or random whitespaces, so even when I try to filter out on whitespace, it misses them.

Here's my data:

library(tidyverse)
test <- tibble(id = c("   ", "91a", "90", "ab"),
               score = c(5, 10, 15, 91))

And here's what I want:

library(tidyverse)
answer <- tibble(id = c("91a","ab"),
               score = c(10, 91))

Thank you!


Solution

  • You can use :

    subset(test, grepl('[a-zA-Z]', id))
    
    #   id    score
    #  <chr> <dbl>
    #1 91a      10
    #2 ab       91
    

    Or in dplyr :

    library(dplyr)
    test %>% filter(grepl('[a-zA-Z]', id))