Search code examples
rreadr

readr - does not read columns with missing headers


I've been running into trouble when using read_tsv on files where the header line is missing entries for the last few columns in a dataset. readr does indicate what is going on with a warning/problem, but this seems run counter to the way readr is supposed to be handling these cases, as displayed here: https://github.com/tidyverse/readr/issues/189

This example call to read_csv is taken from the above link:

read_csv("a,b\n1,2,3,4")
#> Warning: 1 parsing failure.
#> row # A tibble: 1 x 5 col     row   col  expected    actual         file expected   <int> <chr>     <chr>     <chr>        <chr> actual 1     1  <NA> 2 columns 4 columns literal data file # A tibble: 1 x 5
#> 
#> # A tibble: 1 x 2
#>       a     b
#>   <int> <int>
#> 1     1     2
#> Warning message:
#> In rbind(names(probs), probs_f) :
#>   number of columns of result is not a multiple of vector length (arg 2)

Note, I'm using R v3.4.2 and readr v1.1.1. According to previous experience with readr (and the link above), readr should still read in the columns with the missing headers and automatically assign them the names X1 and X2. Did readr change the way it's supposed to handle these cases? Is this a tibble side-effect?


Solution

  • I followed MrFlick's suggestions and posted this to readr's gitHub page. It looks like this is actually a bug: https://github.com/tidyverse/readr/issues/762. Hopefully we'll see a fix in the next version.