Search code examples
regexrdataframebatch-rename

Perform multiple search-and-replaces on the colnames of a dataframe


I have a dataframe with 95 cols and want to batch-rename a lot of them with simple regexes, like the snippet at bottom, there are ~30 such lines. Any other columns which don't match the search regex must be left untouched.

**** Example: names(tr) = c('foo', 'bar', 'xxx_14', 'xxx_2001', 'yyy_76', 'baz', 'zzz_22', ...) ****

I started out with a wall of 25 gsub()s - crude but effective:

names(tr) <- gsub('_1$',    '_R', names(tr))
names(tr) <- gsub('_14$',   '_I', names(tr))
names(tr) <- gsub('_22$',   '_P', names(tr))
names(tr) <- gsub('_50$',   '_O', names(tr))
... yada yada

@Joshua: mapply doesn't work, turns out it's more complicated and impossible to vectorize. names(tr) contains other columns, and when these patterns do occur, you cannot assume all of them occur, let alone in the exact order we defined them. Hence, try 2 is:

pattern <- paste('_', c('1','14','22','50','52','57','76','1018','2001','3301','6005'), '$', sep='')
replace <- paste('_', c('R','I', 'P', 'O', 'C', 'D', 'M', 'L',   'S',   'K',   'G'),         sep='')
do.call(gsub, list(pattern, replace, names(tr)))
Warning messages:
1: In function (pattern, replacement, x, ignore.case = FALSE, perl = FALSE,  :
  argument 'pattern' has length > 1 and only the first element will be used
2: In function (pattern, replacement, x, ignore.case = FALSE, perl = FALSE,  :
  argument 'replacement' has length > 1 and only the first element will be used

Can anyone fix this for me?


EDIT: I read all around SO and R doc on this subject for over a day and couldn't find anything... then when I post it I think of searching for '[r] translation table' and I find xlate. Which is not mentioned anywhere in the grep/sub/gsub documentation.

  1. Is there anything in base/gsubfn/data.table etc. to allow me to write one search-and-replacement instruction? (like a dictionary or translation table)

  2. Can you improve my clunky syntax to be call-by-reference to tr? (mustn't create temp copy of entire df)


EDIT2: my best effort after reading around was:

The dictionary approach (xlate) might be a partial answer to, but this is more than a simple translation table since the regex must be terminal (e.g. '_14$').

I could use gsub() or strsplit() to split on '_' then do my xlate translation on the last component, then paste() them back together. Looking for a cleaner 1/2-line idiom.

Or else I just use walls of gsub()s.


Solution

  • Wall of gsub could be always replace by for-loop. And you can write it as a function:

    renamer <- function(x, pattern, replace) {
        for (i in seq_along(pattern))
                x <- gsub(pattern[i], replace[i], x)
        x
    }
    
    names(tr) <- renamer(
         names(tr),
         sprintf('_%s$', c('1','14','22','50','52','57','76','1018','2001','3301','6005')),
         sprintf('_%s' , c('R','I', 'P', 'O', 'C', 'D', 'M', 'L',   'S',   'K',   'G'))
    )
    

    And I found sprintf more useful than paste for creation this kind of strings.