Two data.frames
of same dimensions can be subtracted in R
using
df1 - df2
But I want to subtract two data.frames
of different dimensions like
df1 <- data.frame(V1=1:5)
df2 <- data.frame(V1=1:5, V2=6:10)
df1-df2
Error in Ops.data.frame(df1, df2) :
‘-’ only defined for equally-sized data frames
This subtracting can be achieved using for loop but I'm looking for any already established function. Thanks
Edited
How about if have to subtract two data.tables of different dimensions?
library(data.table)
dt1 <- data.table(V1=1:5)
dt2 <- data.table(V1=1:5, V2=6:10)
dt1-dt2
Error in `[.data.table`(dt1, row(dt2), ) :
i is invalid type (matrix). Perhaps in future a 2 column matrix could return a list of elements of DT (in the spirit of A[B] in FAQ 2.14). Please let datatable-help know if you'd like this, or add your comments to FR #1611.
dt1[row(dt2),]-dt2
We could do this by making both the datasets have the same length so that we can compare element-by-element of each dataset. In the example given 'df1' has 1 column with 5 rows, where as for 'df2' it is 2 columns with 5 rows. The idea would be to make 'df1' having 10 elements or either 2 columns with 5 rows to match the dimensions of 'df2'. This can be easily done by rep
or a convenient function is row
.
df1[row(df2),]-df2
Just to make it more clear
row(df2)
# [,1] [,2]
#[1,] 1 1
#[2,] 2 2
#[3,] 3 3
#[4,] 4 4
#[5,] 5 5
gives the row index for each row of 'df2'. By doing
df1[row(df2),]
#[1] 1 2 3 4 5 1 2 3 4 5
we replicate each row element twice. Given that the datasets do this in columnwise, it is happening like below
df1[c(row(df2)[,1],row(df2)[,2]),]
This can be subtracted from df2
df1[row(df2),]-df2
# V1 V2
#1 0 -5
#2 0 -5
#3 0 -5
#4 0 -5
#5 0 -5
As @David Arenburg mentioned, this would return wrong results for both datasets having multiple columns. So, if you are going to subtract a single column from 'df1' (having multiple columns) from a multiple column dataset ('df2'), then selecting that column and subtracting from 'df2' may be more general (Thanks to @David Arenburg's code)
df1$V1-df2
# V1 V2
#1 0 -5
#2 0 -5
#3 0 -5
#4 0 -5
#5 0 -5
It works due to the recycling effect, i.e. 'V1' column elements will subtract from the first column of 'df2', then it will again start from the first element to start subtracting from the second column of 'df2' (assuming that both datasets have the same number or rows).
For the second example with data.table
(single column for 'dt1'), one option would be
dt1[,rep(names(dt1), ncol(dt2)),with=FALSE]-dt2
# V1 V1
#1: 0 -5
#2: 0 -5
#3: 0 -5
#4: 0 -5
#5: 0 -5