The data set birth.csv
collected at the Baystate Medical Center, Springfield, USA during 1986 has the following format
After I imported the csv file (using read.csv()
with colClasses
specification), the output of the function str()
didn't match with that of the function head()
. For example, the first 6 values of the column low
were supposed to be 0 but the output sample generated by str()
showed they were 1
'data.frame': 189 obs. of 9 variables:
$ low : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ... # shouldn't they be 0 0 0 0... instead?
$ age : num 19 33 20 21 18 21 22 17 29 26 ...
$ lwt : num 182 155 105 108 107 124 118 103 123 113 ...
$ race : Factor w/ 3 levels "1","2","3": 2 3 1 1 1 3 1 3 1 1 ...
$ smoke: Factor w/ 2 levels "0","1": 1 1 2 2 2 1 1 1 2 2 ...
$ ptl : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ ht : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ ui : Factor w/ 2 levels "0","1": 2 1 1 2 2 1 1 1 1 1 ...
$ ftv : Factor w/ 3 levels "0","1","2": 1 3 2 3 1 1 2 2 2 1 ...
A data.frame: 6 × 9
low age lwt race smoke ptl ht ui ftv
<fct><dbl><dbl><fct><fct><fct><fct><fct><fct>
1 0 19 182 2 0 0 0 1 0
2 0 33 155 3 0 0 0 0 2
3 0 20 105 1 1 0 0 0 1
4 0 21 108 1 1 0 0 1 2
5 0 18 107 1 1 0 0 1 0
6 0 21 124 3 0 0 0 0 0
Could someone please explain what happened? If I built a logistic model for that imported dataset, would the result be wrong?
Factors (categorical variables, <fct>
in the tibble column class labels) in R are stored internally as integers with 1
being the first level (or category), 2
the second level, etc., along with a lookup table mapping the integer values to their labels/levels.
str()
a few of the levels and then the integer values. Most other functions print the labels, not the integer values.
It's extra confusing in your case because your labels are (character-class) integers starting at 0. For a somewhat clearer example, let's look at a factor with letters as the labels
x = factor(c("a", "b", "a", "c"))
x
# [1] a b a c
# Levels: a b c
str(x)
# Factor w/ 3 levels "a","b","c": 1 2 1 3