I'm trying to read in a tab separated table, which keeps producing some parsing failures. I think due to the use of un-backslashed quotes in the text. See below for an example:
concept_id concept_name domain_id vocabulary_id concept_class_id standard_concept concept_code valid_start_date valid_end_date invalid_reason
2618087 Services delivered under an outpatient speech language pathology plan of care Observation HCPCS HCPCS Modifier S GN 19990101 20991231
2618083 "opt out" physician or practitioner emergency or urgent service Observation HCPCS HCPCS Modifier S GJ 19981001 20991231
2618082 Diagnostic mammogram converted from screening mammogram on same day Observation HCPCS HCPCS Modifier S GH 19981001 20991231
Note the "opt out" in the second column, where the problem seems to originate. The following code has parsing failures:
df <- read_delim(
file = "~/_data/test.csv",
col_types = cols(
col_integer(), col_character(), col_character(),
col_character(), col_character(), col_character(),
col_character(), col_date(format = "%Y%m%d"), col_date(format = "%Y%m%d"),
delim = "\t"
Warning: 4 parsing failures.
row col expected actual file
1 NA 10 columns 9 columns '~/_data/test.csv'
2 concept_name delimiter or quote '~/_data/test.csv'
2 concept_name closing quote at end of file '~/_data/test.csv'
2 NA 10 columns 2 columns '~/_data/test.csv'
I can't seem to specify a solution.
This resolves the issue. I needed to modify the quote
argument to quote = ""
df <- read_delim(
file = "~/_data/test.csv",
col_types = cols(
col_integer(), col_character(), col_character(),
col_character(), col_character(), col_character(),
col_character(), col_date(format = "%Y%m%d"), col_date(format = "%Y%m%d"),
quote = "",
delim = "\t"