Search code examples
rtidyversedata-wranglingreadr

Specify number of columns to read when first row is missing values


I have data from a logger that inserts timestamps as rows within the comma separated data. I've sorted out a way to wrangle those timestamps into a tidy data frame (thanks to the responses to this question).

The issue I'm having now is that the timestamp lines don't have the same number of comma-separated values as the data rows (3 vs 6), and readr is defaulting to reading only in only 3 columns, despite me manually specifying column types and names for 6. Last summer (when I last used the logger) readr read the data in correctly, but to my dismay the current version (2.1.1) throws a warning and lumps columns 3:6 all together. I'm hoping that there's some option for "correcting" back to the old behaviour, or some work-around solution I haven't thought of (editing the logger files is not an option).

Example code:

library(tidyverse)

# example data
txt1 <- "
,,Logger Start 12:34
-112,53,N=1,9,15,.25
-112,53,N=2,12,17,.17
"

# example without timestamp header
txt2 <- "
-112,53,N=1,9,15,.25
-112,53,N=2,12,17,.17
"

# throws warning and reads 3 columns
read_csv(
  txt1,
  col_names = c("lon", "lat", "n", "red", "nir", "NDVI"),
  col_types = "ddcddc"
)

# works correctly
read_csv(
  txt2,
  col_names = c("lon", "lat", "n", "red", "nir", "NDVI"),
  col_types = "ddcddc"
)

# this is the table that older readr versions would create
# and that I'm hoping to get back to
tribble(
  ~lon, ~lat, ~n, ~red, ~nir, ~NDVI,
    NA,   NA, "Logger Start 12:34", NA, NA, NA,
  -112,   53, "N=1", 9, 15, ".25",
  -112,   53, "N=2",12, 17, ".17"
)

Solution

  • Use the base read.csv then convert to typle if need be:

    read.csv(text=txt1, header = FALSE,
         col.names = c("lon", "lat", "n", "red", "nir", "NDVI"))
       lon lat                  n red nir NDVI
    1   NA  NA Logger Start 12:34  NA  NA   NA
    2 -112  53                N=1   9  15 0.25
    3 -112  53                N=2  12  17 0.17