Search code examples
rdplyr

How do you find the index of a column that contains a row with a certain partial string?


Similar to this question, but for column index instead of row: How do you find the index of a row that has a certain partial string?

I'm scraping data from PDFs, and tables on different pages have similar information in different alignments. Piece of information "A" may be in Column 2 Row 3 in Table A, and Column 1 Row 4 in Table B.

As indicated above, we can capture the row index using:

df <- data.frame(   
   v1 = c("a", "b", "c", "d", "e"),
   v2 = c("x", "y", "a", "z", "f"),
   v3 = c("x", "y", "a", "z", "s"),
   v4 = c("w", "c", "q", "p", "r")
 )
library(tidyverse)

df %>%
  mutate(ind = row_number()) %>%
  filter(if_any(everything(), ~ str_detect(., 'a'))) %>%
  pull(ind)
# [1] 1 3

But for my (admittedly very unusual) case where I don't know the column, how can I capture the column index (e.g., when searching for "a" in the above example, should be 1, 2, 3)?


Solution

  • A base R solution using colSums :

    search_term <- "a"
    which(colSums(df == search_term) > 0)
    
    #v1 v2 v3 
    # 1  2  3 
    

    The above does an exact match using ==, if you are looking for partial match you may use grepl.

    which(colSums(sapply(df, grepl, pattern = search_term)) > 0)