ANSWERED: Thank you so much Bob, ffs the issue was not specifying comment='#'. Why this works, when 'skip' should've skipped the offending lines remains a mystery. Also see Gray's comment re: Excel's 'Text to Columns' feature for a non-R solution.
Hey folks,
this has been a demon on my back for ages.
The data I work with is always a collection of tab delimited .txt files, so my analysis always begin with gathering the file paths to each and feeding those into read.csv() and binding to a df.
dat <- list.files(
path = 'data',
pattern = '*.txt',
full.names = TRUE,
recursive = TRUE
) %>%
map_df( ~read.csv( ., sep='\t', skip=16) ) # actual data begins at line 16
This does exactly what I want, but I've been transitioning to tidyverse over the last few years.
I don't mind using utils::read.csv(), where my datasets are usually small the speed benefit of readr wouldn't be felt. But, for consistency's sake I'd rather use readr.
When I do the same, but sub readr::read_tsv(), i.e.,
dat <-
.... same call to list.files()
%>%
map_df( ~read_tsv( ., skip=16 ))
I always get an empty (0x0) table. But it seems to be 'reading' the data, because I get a warning print out of 'Parsed with column specification: cols()' for every column in my data.
Clearly I'm misunderstanding here, but I don't know what about it I don't understand, which has made my search for answers challenging & fruitless.
So... what am I doing wrong here?
Thanks in advance!
edit: a example snippet of (one of) my data files was requested, hope this formats well!
# KLIBS INFO
# > KLibs Commit: 11a7f8331ba14052bba91009694f06ae9e1cdd3d
#
# EXPERIMENT SETTINGS
# > Trials Per Block: 72
# > Blocks Per Experiment: 8
#
# SYSTEM INFO
# > Operating System: macOS 10.13.4
# > Python Version: 2.7.15
#
# DISPLAY INFO
# > Screen Size: 21.5" diagonal
# > Resolution: 1920x1080 @ 60Hz
# > View Distance: 57 cm
PID search_type stimulus_type present_absent response rt error
3 time COLOUR present absent 5457.863881 TRUE
3 time COLOUR absent absent 5357.009108 FALSE
3 time COLOUR present present 2870.76412 FALSE
3 time COLOUR absent absent 5391.404728 FALSE
3 time COLOUR present present 2686.6131 FALSE
3 time COLOUR absent absent 5306.652878 FALSE
edit: Using Jukob's suggestion
files <- list.files(
path = 'data',
pattern = '*.txt',
full.names = TRUE,
recursive = TRUE
)
for (i in 1:length(files)) {
print(read_tsv(files[i], skip=16))
}
prints:
Parsed with column specification:
cols()
# A tibble: 0 x 0
... for each file
If I print files, I do get the correct list of file paths. If I remove skip=16 I get:
Parsed with column specification:
cols(
`# KLIBS INFO` = col_character()
)
Warning: 617 parsing failures.
row col expected actual file
15 -- 1 columns 21 columns 'data/raw/2019/colour/p1.2019-02-28.txt'
16 -- 1 columns 21 columns 'data/raw/2019/colour/p1.2019-02-28.txt'
17 -- 1 columns 21 columns 'data/raw/2019/colour/p1.2019-02-28.txt'
18 -- 1 columns 21 columns 'data/raw/2019/colour/p1.2019-02-28.txt'
19 -- 1 columns 21 columns 'data/raw/2019/colour/p1.2019-02-28.txt'
... ... ......... .......... ........................................
See problems(...) for more details.
... for each file
FWIW I was able to solve the problem using your snippet by doing something along the following line:
# Didn't work for me since when I copy and paste your snippet,
# the tabs become spaces, but I think in your original file
# the tabs are preserved so this should work for you
read_tsv("dat.tsv", comment = "#")
# This works for my case
read_table2("dat.tsv", comment = "#")
Didn't even need to specify skip
argument!
But also, no idea why using skip
and not comment
will fail... :(