Search code examples
rdataframedplyrcbind

Different behaviours between base::cbind and dplyr::bind_cols


When combining a data frame and a vector with different number of rows/lengths, bind_cols gives an error, whereas cbind repeats rows – why is this?

(And is it really wise to have that as a default behavior of cbind?)

See example data below.

# Example data
x10 <- c(1:10)
y10 <- c(1:10)
xy10 <- tibble(x10, y10)

z20 <- c(1:20)

# get an error
xyz20 <- dplyr::bind_cols(xy10, z20)

# why does cbind repeat rows of xy10 to suit z20?
xyz20 <- cbind(xy10, z20)
xyz20

Solution

  • base::cbind is a generic function. Its behavior is different for matrix and data frames.

    For matrices, it does warn if objects have different number of rows (see more on Note below).

    cbind(as.matrix(xy10), z20)
    #      x10 y10 z20
    # [1,]   1   1   1
    # [2,]   2   2   2
    # [3,]   3   3   3
    # [4,]   4   4   4
    # [5,]   5   5   5
    # [6,]   6   6   6
    # [7,]   7   7   7
    # [8,]   8   8   8
    # [9,]   9   9   9
    #[10,]  10  10  10
    #Warning message:
    #In cbind(as.matrix(xy10), z20) :
    #  number of rows of result is not a multiple of vector length (arg 2)
    

    But for data frames, it actually creates a data frame from scratch. So the following is identical, both giving a data frame of 20 rows:

    cbind(xy10, z20)
    
    ## in this way, R's recycling rule steps in
    data.frame(xy10[, 1], xy10[, 2], z20)
    

    From ?cbind:

    The ‘cbind’ data frame method is just a wrapper for ‘data.frame(..., check.names = FALSE)’. This means that it will split matrix columns in data frame arguments, and convert character columns to factors unless ‘stringsAsFactors = FALSE’ is specified.


    Note: In non-data.frame cases, matrices are not allowed to grow bigger. Only vectors will be recycled or truncated.

    ## handling two vectors
    ## vector of shorter length is recycled
    cbind(1:2, 1:4)
    #     [,1] [,2]
    #[1,]    1    1
    #[2,]    2    2
    #[3,]    1    3
    #[4,]    2    4
    
    ## handling two matrices
    ## has strict requirement on dimensions
    cbind(as.matrix(1:2), as.matrix(1:4))
    #Error in cbind(as.matrix(1:2), as.matrix(1:4)) : 
    #  number of rows of matrices must match (see arg 2)
    
    ## handling a matrix and a vector
    ## vector of shorter length is recycled
    cbind(1:2, as.matrix(1:4))
    #     [,1] [,2]
    #[1,]    1    1
    #[2,]    2    2
    #[3,]    1    3
    #[4,]    2    4
    
    ## handling a matrix and a vector
    ## vector of longer length is truncated
    cbind(as.matrix(1:2), 1:4)
    #     [,1] [,2]
    #[1,]    1    1
    #[2,]    2    2
    #Warning message:
    #In cbind(1:4, as.matrix(1:2)) :
    #  number of rows of result is not a multiple of vector length (arg 1)
    

    From ?cbind:

    If there are several matrix arguments, they must all have the same number of rows....

    If all the arguments are vectors, ..., values in shorter arguments are recycled to achieve this length...

    When the arguments consist of a mix of matrices and vectors, the number of rows of the result is determined by the number of rows of the matrix arguments... vectors... are recycled or subsetted to achieve this length.