I have a data frame with an specific pattern in its column names:
wssgroup_1_norm
wxsgroup_2_norm
wargroup_3_norm
wetgroup_10_norm
wegroup_11_norm
I used an approach that recognized the group_*
(*: any number from 1 to 11) string, then, replace the string with another string.
IMPORTANT: please notice that group_*
is bind to more text without an underscore in between, like this: wssgroup_1
.
Here is the code:
# Mapping of group names to replacement strings
group_names_l <- list(
group_1 = "_IgG1", group_2 = "_IgG2", group_3 = "_IgG3", group_4 = "_IgG4",
group_5 = "_IgA1", group_6 = "_IgM", group_7 = "_FcyR2", group_8 = "_FcyR2b",
group_9 = "_FcyR3av", group_10 = "_FcyR3b", group_11 = "_C1q"
)
# Replace column names using the group_names_l mapping
colnames(luminex_data_o_v2) <- sapply(colnames(luminex_data_o_v2), function(col)
{
for (key in names(group_names_l)) {
if (grepl(key, col)) {
return(gsub(key, group_names_l[[key]], col))
}
}
return(col)
})
This approach works partially fine, because I realized that the group 1, 10 and 11
is recognized as one and replaced as _IgG1
in all three cases. So the new column names are partially correct, and my new df have incorrect column names in the groups_10 and 11
.
Then, I receive some advice to replace the string using word boundaries
(from StackOverflow),
one approach was this:
gsub(paste0("^", key,"$"), group_names_l[[key]], col))
or this:
gsub(paste0("\\b", key,"\\b"), group_names_l[[key]], col)
However, this two new approach do nothing in my df, are currently not recognizing nothing on my column names.
Questions:
It is clear what is going on when you actually look at the group_names_l
list:
_
(and it is attached on the left to another word char in your data), so no word boundary check is necessary here, on the left sideIn the end, all you want is to check if there is no digit on the right:
colnames(df) <- sapply(colnames(df), function(col)
{
for (key in names(group_names_l)) {
if (grepl(paste0(key, "(?!\\d)"), col, perl=TRUE)) {
return(sub(paste0(key, "(?!\\d)"), group_names_l[[key]], col, perl=TRUE))
}
}
return(col)
})
I added perl=TRUE
to both grep
and sub
commands since the patterns require a PCRE regex engine.
I replaced gsub
with sub
since you expect a single replacement in the string anyway.