Search code examples
rcompare

Compare two pairs of columns with different lengths


I have the following two sample data frames of different lengths with the same column names:

 data1=data.frame('name'=c('siva','ramu','giri'), 
            'xx'=c(1,0,3))



 name xx
1 siva  1
2 ramu  0
3 giri  3



data2=data.frame('name'=c('siva','ramya','giri','geetha','pallavi'), 
               'xx'=c(1,2,3,4,5))
    name xx
1    siva  1
2   ramya  2
3    giri  3
4  geetha  4
5 pallavi  5

I want to compare the pair of columns in data1 with the corresponding pair of columns in data2. For example, the 1rst row in data1 is the same with the 1rst row in data2. Hence, for this row it holds TRUE. The same holds for row 3.For the other rows we should get FALSE

I tried

library(arsenal)
comparedf(data1,data2)
Compare Object

Function Call: 
comparedf(x = data1, y = data2)

Shared: 2 non-by variables and 3 observations.
Not shared: 0 variables and 2 observations.

Differences found in 2/2 variables compared.
0 variables compared have non-identical attributes

.

Is that correct? If it is, I can not interpret this output.


Solution

  • If you want to use the comparedf function, you need to summarise the results:

    Without a "by" argument data frames are compared row-by-row (as stated in the help page).

    summary(comparedf(data1, data2))
    

    Gives (after omitting some irrelevant output)

    Table: Summary of data.frames
    
    version   arg      ncol   nrow
    --------  ------  -----  -----
    x         data1       2      3
    y         data2       2      5
    
    Table: Summary of overall comparison
    
    statistic                                                      value
    ------------------------------------------------------------  ------
    Number of by-variables                                             0
    Number of non-by variables in common                               2
    Number of variables compared                                       2
    Number of variables in x but not y                                 0
    Number of variables in y but not x                                 0
    Number of variables compared with some values unequal              2
    Number of variables compared with all values equal                 0
    Number of observations in common                                   3
    Number of observations in x but not y                              0
    Number of observations in y but not x                              2
    Number of observations with some compared variables unequal        1
    Number of observations with all compared variables equal           2
    Number of values unequal                                           2
    
    Table: Observations not shared
    
    version    ..row.names..   observation
    --------  --------------  ------------
    y                      4             4
    y                      5             5
    
    Table: Differences detected by variable
    
    var.x   var.y     n   NAs
    ------  ------  ---  ----
    name    name      1     0
    xx      xx        1     0
    
    Table: Differences detected
    
    var.x   var.y    ..row.names..  values.x   values.y    row.x   row.y
    ------  ------  --------------  ---------  ---------  ------  ------
    name    name                 2  ramu       ramya           2       2
    xx      xx                   2  0          2               2       2