This is my first post, and I am relatively new to R, so apologies if I have framed this poorly.
I have not found this problem described anywhere else but the initial approach is somewhat similar to that decribed here:
How to mutate several columns by column index rather than column name using across?.
I have a data frame containing time series data, where I would like to remove specific columns from a range of continous columns. In the example below, the values in 1R would be removed from columns 1A, 1B and 1C. Likewise the values in 2R would be removed from 2A, 2B and 2C.
So a dataframe like this
t | 1A | 1B| 1C|1RMV| 2A | 2B| 2C|2RMV|
- | - -|- -|- -| - -| - -|- -|- -|- - |
1 | 1 | 4 | 7 | 3 | 1 | 4 | 7 | 1 | . . . . . . .
2 | 2 | 5 | 8 | 2 | 2 | 5 | 8 | 2 |
3 | 3 | 6 | 9 | 1 | 3 | 6 | 9 | 3 |
Would become this
t | 1A | 1B| 1C|1RMV| 2A | 2B| 2C|2RMV|
-| - -|- -|- -| - -| - -|- -|- -|- - |
1 | -2 | 1 | 4 | 3 | 0 | 3 | 6 | 1 | . . . . . . .
2 | 0 | 3 | 6 | 2 | 0 | 3 | 6 | 2 |
3 | 2 | 5 | 8 | 1 | 0 | 3 | 6 | 3 |
I have previously performed this 'manually' and it works just fine, however since trying to make this process more automatic I am running into problems.
As the number of columns in each group (1A,1B,1C whereas 2A,2B,2C,2D,2E etc.) is different I initially create a list with index positions of all the columns which I would like to subtract from the others like so:
#Return TRUE only for columns to be removed
df_boolean <- str_ends(colnames(df), "RMV")
#Create a 1D vector with elements of index positions of columns to be removed in Data
col_number <- ncol(Intensity_Raw_Data)
remove_indices <- c()
for(i in 1:col_number){
if(df_boolean[i] == TRUE){
remove_indices <- c(background_indices, i)
}
}
Then I perform the subtraction using across from dplyr like so:
group_number <- length(remove_indices)
#Calculate subtraction for first group, probably way to do it in one loop but first column is the time column and I'm lazy
df_Subtracted <- df %>%
mutate(across(2:(remove_indices[1] - 1), ~. - df[(remove_indices[1])]))
#Calculate subtracction for remaining groups
for(i in 2:group_number){
df_Subtracted <- df_Subtracted %>%
mutate(across((remove_indices[i-1] + 1):(remove_indices[i] - 1), ~.x - df[(remove_indices[i])]))
Here I run into my problem, when running this manually (i.e. manually typing out column names in across() ), the names of the columns remains the same. However when I run this using the code above column names are renamed as such:
1A$1R 1B$1R 1C$1R . . . . 2A$2R 2B$2R 2C$2R 2D$2R. . . . . .
While the output in View() appears correct using str() reveals that each column in the output (df_Subtracted) is in fact a 1 variable data frame.
I am not sure what is causing this to happen, However I think it may be to do with how I am indexing the column to be removed in across. Any help would be appreciated !
**
- UPDATE
**
I modified GuedesBF anwser slightly by using the approach used by Akrun in this post to make a generalised anwser for data divided by column name.
df_subtracted_split <- df %>%
split.default(sub('\\d+', '', names(df))) %>%
lapply(function(x) {names(x)[ncol(x)] <- "RMV";x}) %>%
map(~mutate(.x, across(1:last_col(1), ~.x - RMV)))
df_subtracted <- do.call(qpcR:::cbind.na, Data_Final)
For some reason list_rbind/list_cbind resulted in dropping off
columns, I read here that it is probably a result of some groups in my data frame having missing rows, thus I used cbind.na
from qpcR
instead.
Thanks GuedesBF and peter861222!
This gets easier if we split.default()
the data.frame into a list of similar data.frames, do the necessary operations, and finally bind
the list back into a single data.frame
library(dplyr)
library(readr)
library(purrr)
df %>%
select(-t) %>%
split.default(parse_number(names(.)) %>%
map(~mutate(.x, across(c(2A, 2B, 2C), \(x) x - cur_data[[4]])) %>%
list_rbind()