I have some data that I'm trying to clean up, and I noticed that I have 150 files that have rows that are subsets of previous rows. Is there a way that I can drop everything after certain criteria occur? Below I'm not sure how I'd write out sample data for this via code, so I've listed an example of the data as text. Below. I'd like to drop all rows at and below "section 2"
Name,Age,Address
Section 1,,
Abby,10,1 Baker St
Alice,12,3 Main St
Becky,13,156 F St
Ben,14,2 18th St
Cameron,15,4 Journey Road
Danny,16,123 North Ave
Eric,17,325 Hill Blvd
,,
Section 2,,
Abby,10,1 Baker St
Alice,12,3 Main St
Becky,13,156 F St
Ben,14,2 18th St
,,
Section 3,,
Becky,13,156 F St
Ben,14,2 18th St
Cameron,15,4 Journey Road
Danny,16,123 North Ave
,,
Section 5,,
Alice,12,3 Main St
Becky,13,156 F St
Ben,14,2 18th St
Cameron,15,4 Journey Road
Danny,16,123 North Ave
Eric,17,325 Hill Blvd
Expected output
Name,Age,Address
Section 1,,
Abby,10,1 Baker St
Alice,12,3 Main St
Becky,13,156 F St
Ben,14,2 18th St
Cameron,15,4 Journey Road
Danny,16,123 North Ave
Eric,17,325 Hill Blvd
Assuming your text file is called temp.txt
you can use readLines
to read it in, find the line with 'Section 2'
in it and read all the lines above that.
tmp <- readLines('temp.txt')
inds <- grep('Section 2', tmp) - 2
data <- read.csv(text = paste0(tmp[1:inds], collapse = '\n'))
data
# Name Age Address
#1 Section 1 NA
#2 Abby 10 1 Baker St
#3 Alice 12 3 Main St
#4 Becky 13 156 F St
#5 Ben 14 2 18th St
#6 Cameron 15 4 Journey Road
#7 Danny 16 123 North Ave
#8 Eric 17 325 Hill Blvd