Search code examples

Concrete examples on why dimension is not defined for vectors (vectors are dimensionless) in R?

Here: in R, to arise the need to define dimension for a vector,

M. JORGENSEN (Dept of Stat, U of Waikato, NZ):
"Would it not make sense to have dim(A)=length(A) for all vectors?"

B.D. RIPLEY (Dept of Applied Statistics, Oxford, UK):
"No. A one-dimensional array and a vector are not the same thing. There are subtle differences, such as what names() means (see ?names).

That a 1D array and a vector print in the same way does occasionally lead to confusion, but then you also cannot tell from your printout that A has type integer and not double.
My question:
(1) Not only I cannot figure out the subtle difference on names() but also
(2) I cannot produce a concrete example about "telling from the printout that A has type integer and not double issue".

Any help to clarify JORGENSEN-RIPLEY discussion (with concrete examples in R) will be appreciated.


  • To address the first question, let's first create a vector and a 1-d array:

    (vector <- 1:10)
    #>  [1]  1  2  3  4  5  6  7  8  9 10
    (arr_1d <- array(1:10, dim = 10))
    #>  [1]  1  2  3  4  5  6  7  8  9 10

    If we give the objects some names, we can see the difference that Ripley alludes to by looking at the attributes:

    names(vector) <- letters[1:10]
    names(arr_1d) <- letters[1:10]
    #> $names
    #>  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
    #> $dim
    #> [1] 10
    #> $dimnames
    #> $dimnames[[1]]
    #>  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"

    That is, the 1-d array doesn't actually have a names attribute, but rather a dimnames attribute (which is a list, not a vector), the first element of which names() actually accesses.

    This is covered in the "Note" section in ?names:

    For vectors, the names are one of the attributes with restrictions on the possible values. For pairlists, the names are the tags and converted to and from a character vector.

    For a one-dimensional array the names attribute really is dimnames[[1]].

    Here we also see the lack of a dim attribute for vectors. (A related SO answer covers the differences between arrays and vectors, too.)

    The additional attributes and their storage method means that 1-d arrays always take up a little more memory than their vector equivalent:

    # devtools::install_github("r-lib/lobstr")
    #> 848 B
    #> 1,056 B

    However, that's about the only reason I can think of why one would want to have separate types for vectors and 1-d arrays. I would assume this was really the question that Jorgensen was asking, i.e. why have a separate vector type without the dim attribute at all; and I don't think Ripley really addresses that. I'd be very interested to hear other rationale for this.

    As for point 2), when you create a vector with : it is always an integer:

    vector <- 1:10
    #> [1] "integer"

    A double with the same values will print the same:

    double <- as.numeric(vector)
    #> [1] "double"
    #>  [1]  1  2  3  4  5  6  7  8  9 10

    But integers and doubles are not the same thing:

    identical(vector, double)
    #> [1] FALSE

    The differences between integers and doubles in R are subtle, the main one being that integers take up less space in memory.

    #> 88 B
    #> 168 B

    See this answer for a more comprehensive overview of the differences between integers and doubles.

    Created on 2018-07-09 by the reprex package (v0.2.0.9000).