Using the iris dataset for this example, cuz many people know it.
I scaled the first 4 variables of the dataset, and named it scaled.iris.
standardized.iris <- scale(iris[,-5])
Why is there a difference whether I now cbind a column by index
cbind(scaled.iris,iris[5])
or by variable name
cbind(scaled.iris,iris$Species)
?
The former gives me a data.frame with a column with the actual labels ("setosa","versicolor",...) plus the correct column name, the latter gives me a matrix with no column name plus character values 1-3.
This is because of the class of the object that is being called. It has nothing to do with cbind()
.
When a column is chosen from a data frame using the column number, the chosen column is extracted as data frame. If a single column is called, the extracted object will be a data frame with one column. If more than one column are called, then the extracted object will be the data frame with columns as many as called.
When a column is chose using the column name, the chosen column is extracted as vector with no name. If a single column is called, the extracted object will be a a single vector. If more than one column are called, then the extracted object will be a vector of vectors as many as called.
If you run the str()
, you could find the difference.
> str(iris[1])
'data.frame': 150 obs. of 1 variable:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
> str(iris$Sepal.Length)
num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
You could see that the class of the former is data.frame consisting of a numeric vector and that of the latter is numeric.