Search code examples
rtidyversetidyrreadrcolumn-types

The read_csv() function is reading certain columns with a column type different from the rest


As found in the readr package, I'm using the function read_csv() in order to import some data. For some reason, two of the columns (2013 and 2014) got identified as character, while the rest got identified as double. This was not an issue until I tried to tidy the data frames, at which point I ran into this issue:

Error in `pivot_longer()` at src/tidy_tertiary.R:5:2:
! Can't combine `2010` <double> and `2013` <character>.

The .csv file is formated as follows:

country,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020
Austria,3.5723,3.71357,3.7849,3.85794,3.82108,3.85482,3.91019,3.90898,3.91666,3.86284,3.86712
Belgium,3.31314,3.41688,3.49597,3.5459,3.59628,3.66023,3.70912,3.87233,3.82212,3.87927,3.91544

and so on. The code that leads to the error is this:

library(tidyverse)

tertiary <- tertiary |>
  pivot_longer(
    cols = starts_with("20"),
    names_to = "year",
    values_to = "tertiary")

Thank you in advance!


Solution

  • All your year columns were of type double in the sample dataset. I guess you should examine your CSV file carefully to make sure they are indeed all of types numeric without extra character strings in them. Otherwise, the following solution might give you NA or throw an error.

    First, you can specify column types in read_csv. You can set default column types to double "d", except the country column, which should be character "c".

    library(tidyverse)
    
    read_csv("your_file.csv", col_types = cols(.default = "d", country = "c"))
    

    Or change everything except country to numeric after you have read in the CSV file (assume it's assigned to df).

    df %>% mutate(across(-country, as.numeric))