Search code examples
rreadr

readr forcing column type


I am trying to read in a CSV file - and am attempting to force the columns to be of a certain type. But the very last column gives me an error: "Error in is.list(col_types) : Unknown shortcut: g"

Any advice, please? Thank you!

library(readr)

# Create data frame & write it out:
temp <- data.frame(a = 1:1001,
                   mystring_b = c(rep(NA, 1000), "1"),
                   mystring_c = c(rep(NA, 1000), "2"))
write.csv(temp, "temp.csv", row.names = F)

# Grab its names:
temp_labels <- names(read_csv("temp.csv", n_max = 0))

# Specify data type - for each column:
labels_type <- ifelse(grepl("mystring", temp_labels), "numeric", "guess")

# Reading in while forcing column types:
temp <- read_csv("temp.csv", col_types = labels_type)

# Error in is.list(col_types) : Unknown shortcut: g

Solution

  • Here's an excerpt of the description of the col_types from the help page ?read_csv:

    col_types

    ... Alternatively, you can use a compact string representation where each character represents one column: c = character, i = integer, n = number, d = double, l = logical, D = date, T = date time, t = time, ? = guess, or _/- to skip the column.

    So, as the error message says, "g" is not an accepted shortcut. You should use "?" instead.

    Also, while read_csv seems to be luckily taking the first character from your "numeric" specification, to be safe you should probably use "n" to match the documentation. In fact, if you look at the examples, the intent is to use a single string, not a vector of strings with length > 1, as the specification. Again, you're lucky if your method is working otherwise, but best to match the documentation, something like this:

    labels_type <- paste(ifelse(grepl("mystring", temp_labels), "n", "g"), collapse = "")