Search code examples
rdataframeattributes

test for existing row.names and col.names in data.frame


Is there a function to determine whether a data.frame has native row names and column names or just has automatically generated ones (1 2 3 4...) ? For column names, 'automatically' means for instance when you apply "as.data.frame" to a matrix..

For the row names, I figured out a workaround:

has.row.names = function(df) {
  !all(row.names(df)==seq(1, nrow(df)))
}

However, for for the column names i don't see how to do it. The difficulty is that sometimes automated col.names start with V1 V2 etc, sometimes with X1., X2..

Why I ask this question: I need to perform this test inside a more complex function (somewhat similar to the graphical output of a PCA) that will plot the row names and column names if existing, and if not it will create more suited new names. So it should work for "any" data.frame, with no clue of the actual names.


Solution

  • Short version: The only time a data frame would not have column names is when the attribute "names" is NULL. So the simple way to check for the existence of column names in a data frame would be something like the following.

    DFHasColNames <- function(x) {
        stopifnot(is.data.frame(x))
        Negate(is.null)(names(x))
    }
    DFHasColNames(mtcars)
    # [1] TRUE
    DFHasColNames(unname(mtcars))
    # [1] FALSE
    

    Extended version: For row names, you can use .row_names_info(). With the default type = 1L, a negative sign indicates the row names were generated automatically.

    .row_names_info(mtcars)
    # [1] 32   # row names were provided 
    .row_names_info(iris)
    # [1] -150 # row names were generated automatically
    

    You can also view other information by changing the type argument.

    type integer. Currently type = 0 returns the internal "row.names" attribute (possibly NULL), type = 2 the number of rows implied by the attribute, and type = 1 the latter with a negative sign for ‘automatic’ row names.

    .row_names_info(mtcars, type = 0)
    ## ... returns attr(mtcars, "row.names")
    .row_names_info(iris, type = 0)
    ## [1]   NA -150
    

    For column names, it's not so easy. Generally speaking, if you see all NA values for the column names, or names(x) returns NULL, the "names" attribute of x is not set and therefore x has no (column) names.

    Otherwise, a prepended X usually means the names came from make.names(), which is used by data.frame() and read.table(), read.csv() and others.

    m <- matrix(1:6, 2)
    make.names(seq_len(ncol(m)))
    # [1] "X1" "X2" "X3"
    data.frame(m)
    #   X1 X2 X3
    # 1  1  3  5
    # 2  2  4  6
    

    whereas you generally get a prepended V from as.data.frame()

    as.data.frame(m)
    #   V1 V2 V3
    # 1  1  3  5
    # 2  2  4  6
    

    However, this is not a rule. It depends on the class of the object you're passing to as.data.frame(), and whether or not you have changed any of the default arguments. The best thing to do would be to sift through the many methods(as.data.frame) to see if you can discover a pattern.