Search code examples
rmatrixdataframer-faq

Should I use a data.frame or a matrix?


When should one use a data.frame, and when is it better to use a matrix?

Both keep data in a rectangular format, so sometimes it's unclear.

Are there any general rules of thumb for when to use which data type?


Solution

  • Part of the answer is contained already in your question: You use data frames if columns (variables) can be expected to be of different types (numeric/character/logical etc.). Matrices are for data of the same type.

    Consequently, the choice matrix/data.frame is only problematic if you have data of the same type.

    The answer depends on what you are going to do with the data in data.frame/matrix. If it is going to be passed to other functions then the expected type of the arguments of these functions determine the choice.

    Also:

    Matrices are more memory efficient:

    m = matrix(1:4, 2, 2)
    d = as.data.frame(m)
    object.size(m)
    # 216 bytes
    object.size(d)
    # 792 bytes
    

    Matrices are a necessity if you plan to do any linear algebra-type of operations.

    Data frames are more convenient if you frequently refer to its columns by name (via the compact $ operator).

    Data frames are also IMHO better for reporting (printing) tabular information as you can apply formatting to each column separately.