Search code examples
rcbindiris-dataset

What is the difference in cbind when used with index compared with variable name


Using the iris dataset for this example, cuz many people know it.

I scaled the first 4 variables of the dataset, and named it scaled.iris. standardized.iris <- scale(iris[,-5]) Why is there a difference whether I now cbind a column by index cbind(scaled.iris,iris[5]) or by variable name cbind(scaled.iris,iris$Species)?

The former gives me a data.frame with a column with the actual labels ("setosa","versicolor",...) plus the correct column name, the latter gives me a matrix with no column name plus character values 1-3.


Solution

  • This is because of the class of the object that is being called. It has nothing to do with cbind().

    When a column is chosen from a data frame using the column number, the chosen column is extracted as data frame. If a single column is called, the extracted object will be a data frame with one column. If more than one column are called, then the extracted object will be the data frame with columns as many as called.

    When a column is chose using the column name, the chosen column is extracted as vector with no name. If a single column is called, the extracted object will be a a single vector. If more than one column are called, then the extracted object will be a vector of vectors as many as called.

    If you run the str(), you could find the difference.

    > str(iris[1])
    'data.frame':   150 obs. of  1 variable:
     $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
    
    > str(iris$Sepal.Length)
     num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
    

    You could see that the class of the former is data.frame consisting of a numeric vector and that of the latter is numeric.