Confusion with the output of the function str

The data set birth.csv collected at the Baystate Medical Center, Springfield, USA during 1986 has the following format

After I imported the csv file (using read.csv() with colClasses specification), the output of the function str() didn't match with that of the function head(). For example, the first 6 values of the column low were supposed to be 0 but the output sample generated by str() showed they were 1

A data.frame: 6 × 9
    low age lwt race smoke  ptl ht  ui  ftv
1   0   19  182 2    0      0   0   1   0
2   0   33  155 3    0      0   0   0   2
3   0   20  105 1    1      0   0   0   1
4   0   21  108 1    1      0   0   1   2
5   0   18  107 1    1      0   0   1   0
6   0   21  124 3    0      0   0   0   0

Could someone please explain what happened? If I built a logistic model for that imported dataset, would the result be wrong?


  • Factors (categorical variables, <fct> in the tibble column class labels) in R are stored internally as integers with 1 being the first level (or category), 2 the second level, etc., along with a lookup table mapping the integer values to their labels/levels.

    str() a few of the levels and then the integer values. Most other functions print the labels, not the integer values.

    It's extra confusing in your case because your labels are (character-class) integers starting at 0. For a somewhat clearer example, let's look at a factor with letters as the labels

    x = factor(c("a", "b", "a", "c"))
    # [1] a b a c
    # Levels: a b c
    # Factor w/ 3 levels "a","b","c": 1 2 1 3