I have several csv files recorded by air sensor (TSI Bluesky and AirAssure). This device records the data to its SD card. As with many machine-recorded files, the first 59 lines are notes that start with # to record basic information like serial numbers. These notes are easy to skip by adding skip=59
. However, these notes could appear in the middle of the csv files by breaking the record. Meanwhile, the column names will repeat again. I have an example below.
#note | ||
#note | ||
#note | ||
#note | ||
col1 | col2 | col3 |
unit1 | unit2 | unit3 |
1 | 2 | 3 |
1 | 2 | 3 |
1 | 2 | 3 |
#note | ||
#note | ||
#note | ||
#note | ||
col1 | col2 | col3 |
unit1 | unit2 | unit3 |
1 | 2 | 3 |
1 | 2 | 3 |
1 | 2 | 3 |
How can I skip all the note
and unit
and only keep one column name and all the numbers?
This code reads data from text, so if you are loading the csv file from some a folder, please check that the separator is "\t" or " "
The comment.char
parameter filters the notes: #note
text <-
"
#note
#note
#note
#note
col1 col2 col3
unit1 unit2 unit3
1 2 3
1 2 3
1 2 3
#note
#note
#note
#note
col1 col2 col3
unit1 unit2 unit3
1 2 3
1 2 3
1 2 3
"
library(dplyr)
df <- read.csv(text = text, comment.char = "#", sep = "\t")
filter(df, !col1 %in% c('col1', 'unit1'))
Output:
col1 col2 col3 1 1 2 3 2 1 2 3 3 1 2 3 4 1 2 3 5 1 2 3 6 1 2 3