Search code examples
rreadr

What are permissible column objects of the form "col_*()" used in readr?


readr::read_csv is misreading some column types in a file I am loading so I want to use cols to set them manually.

In ?read_csv, it says the col_types argument should be _"One of ‘NULL’, a ‘cols()’ specification, or a string. See ‘vignette("column-types")’ for more details". Well, vignette("column-types") gives vignette("column-types") not found so I tried ?cols. It says it accepts "column objects created by ‘col_*()’ or their abbreviated character names".

What are the acceptable functions or abbreviated character names and where do I find that information? readr 1.1.1 btw.


Solution

  • There are col_double, col_integer, col_character, col_date, col_factor, .etc

    library(readr)
    
    mtcars <- read_csv(readr_example("mtcars.csv"), col_types = 
                         cols(
                           mpg = col_double(),
                           cyl = col_integer(),
                           disp = col_double(),
                           hp = col_integer(),
                           drat = col_double(),
                           vs = col_integer(),
                           wt = col_double(),
                           qsec = col_double(),
                           am = col_integer(),
                           gear = col_integer(),
                           carb = col_integer()
                         )
    )
    mtcars
    
    #> # A tibble: 32 x 11
    #>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
    #>    <dbl> <int> <dbl> <int> <dbl> <dbl> <dbl> <int> <int> <int> <int>
    #>  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
    #>  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
    #>  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
    #>  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
    #>  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
    #>  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
    #>  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
    #>  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
    #>  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
    #> 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
    #> # ... with 22 more rows
    

    Alternatively, you can use a compact string representation where each character represents one column: c = character, i = integer, n = number, d = double, l = logical, D = date, T = date time, t = time, ? = guess, or _/- to skip the column.

    mtcars_select <- read_csv(readr_example("mtcars.csv"), 
                              col_types = cols_only(mpg = 'd', cyl = 'i', hp = 'i', 
                                                    qsec = 'd', gear = 'i'),
                              na = c("NA", "N/A", "-9999", "-999"))
    mtcars_select
    
    #> # A tibble: 32 x 5
    #>      mpg   cyl    hp  qsec  gear
    #>    <dbl> <int> <int> <dbl> <int>
    #>  1  21       6   110  16.5     4
    #>  2  21       6   110  17.0     4
    #>  3  22.8     4    93  18.6     4
    #>  4  21.4     6   110  19.4     3
    #>  5  18.7     8   175  17.0     3
    #>  6  18.1     6   105  20.2     3
    #>  7  14.3     8   245  15.8     3
    #>  8  24.4     4    62  20       4
    #>  9  22.8     4    95  22.9     4
    #> 10  19.2     6   123  18.3     4
    #> # ... with 22 more rows
    

    Or even shorter

    mtcars <- read_csv(readr_example("mtcars.csv"), col_types = "di_i__d__i_")
    mtcars
    
    # A tibble: 32 x 5
         mpg   cyl    hp  qsec  gear
       <dbl> <int> <int> <dbl> <int>
     1  21       6   110  16.5     4
     2  21       6   110  17.0     4
     3  22.8     4    93  18.6     4
     4  21.4     6   110  19.4     3
     5  18.7     8   175  17.0     3
     6  18.1     6   105  20.2     3
     7  14.3     8   245  15.8     3
     8  24.4     4    62  20       4
     9  22.8     4    95  22.9     4
    10  19.2     6   123  18.3     4
    # ... with 22 more rows
    

    Ref:

    https://cran.r-project.org/web/packages/readr/vignettes/readr.html
    https://www.rdocumentation.org/packages/readr/versions/1.1.1/topics/cols