Search code examples
rsymbolsquotingrlangquasiquotes

Only strings can be converted to symbols within a function in R


I have a function that is intended to operate on data obtained from a variety of sources with many manual entry fields. Since I don't know what to expect for the layout or naming convention used in these files, I want it to 'scan' a data frame for columns with the character string 'fix', 'name', or 'agent', and mutate the column to a new column with name 'Firm', then proceed to do string cleaning on the entries of that column, then finally, remove the original column. I have gotten it to work with SOME of the CSVs that I have already, but now have run into this error: ONLY STRINGS CAN BE CONVERTED TO SYMBOLS. I have checked into this thread ERROR: Only strings can be converted to symbols but to no avail.

Here is the function at the moment:

clean_firm_names2 <- function(df){
  df <- df %>%
    mutate(Firm := !!rlang::sym(grep(pattern = '(AGENT)|(NAME)|(FIX)',x = colnames(.), ignore.case = T, value = T)) %>% 
             str_replace_all(pattern = "(\\W)+"," ") %>% 
             ...str manipulations...
             str_squish()) %>%
    dplyr::select(-(!!rlang::sym(grep(pattern = '(AGENT)|(NAME)|(FIX)',x = colnames(.), ignore.case = T, value = T))))
  return(df)
}

I have tried using as.character() around the grep() function but that did not solve the problem. I have looked at the CSV that the function is meant to operate on and all of the column names are character strings. I read in the CSV using vroom(), as with my other CSVs, and that works fine, all of the column names appear. I can perform other dplyr functions on the df, suggesting to me that the df is behaving normally otherwise. I have run out of ideas as to why the function is choking up only on SOME of my CSVs but works as intended on others. Has anyone run into similar issues or got any clues as to what might be causing this error? This is the first time I've used SO-- I'm sorry if this question isn't very clear. I'll try and edit as needed.

Thanks!


Solution

  • Note that grep() returns indices of the matches (integers), not the matches themselves (strings). Integer indices can be passed directly to dplyr::rename, so perhaps the following may work better?

    i <- grep(pattern = '(AGENT)|(NAME)|(FIX)', x = colnames(df), ignore.case = T, value = T)
    df <- df %>%
      rename(Firm = i) %>%
      mutate(Firm = ...str manipulations... )
    

    (There is an implicit assumption here that your grep() returns a single index. Additional code may be required to handle multiple matches.)