I am trying to use tidyverse read_delim() to read a tab-separated text file. I can easily use the basic R's read.table() with no problem but when I tested read_delim() with delim = "\t"; I got a problem. For example, I have a file below, "test.txt". As you can see, the header shifts to the right as the first col is row names without a header.
T1 T2 T3
A 1 4 7
B 2 5 8
C 3 6 9
I can use basic R to read this file successfully:
dat <- read.table("test.txt", header=T, sep="\t")
dat
T1 T2 T3
A 1 4 7
B 2 5 8
C 3 6 9
But when I tried to use tidyverse read_delim, I got problems:
dat1 <- read_delim("test.txt", delim ="\t")
Rows: 3 Columns: 3
── Column specification ──────────────────────────────────────────────────────────
Delimiter: "\t"
chr (2): T1, T3
dbl (1): T2
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Warning message:
One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
dat <- vroom(...)
problems(dat)
I know basic R's read.table() can automatically correct this problem, but could someone tell me if tidyverse read_delim() has a way to resolve this issue? Thank you! -Xiaokuan
The issue isn’t exactly that the headers are misaligned - it’s that readr doesn’t support or recognize row names at all.* readr::read_delim()
therefore doesn’t account for the fact that row names don’t have a column header, and just sees three column names followed by four columns of data.
If your goal is to import your data as a tibble, your best bet is probably to use base::read.table()
, then tibble::as_tibble()
, using the rownames
arg to convert the row names to a regular column.
library(tibble)
dat <- read.table("test.txt", header=T, sep="\t")
as_tibble(dat, rownames = "row")
# A tibble: 3 × 4
row T1 T2 T3
<chr> <dbl> <dbl> <dbl>
1 A 1 4 7
2 B 2 5 8
3 C 3 6 9
Another option would be to manually edit your input file to include a column head above the row names.
*This isn’t an oversight, by the way — it’s an intentional choice by the tidyverse team, as they believe row names to be bad practice. e.g., from the tibble
docs: “Generally, it is best to avoid row names, because they are basically a character column with different semantics than every other column.” Also see this interesting discussion from the tibble github.