Search code examples
rrowsdata-cleaning

Drop rows after criteria


I have some data that I'm trying to clean up, and I noticed that I have 150 files that have rows that are subsets of previous rows. Is there a way that I can drop everything after certain criteria occur? Below I'm not sure how I'd write out sample data for this via code, so I've listed an example of the data as text. Below. I'd like to drop all rows at and below "section 2"

Name,Age,Address
Section 1,,
Abby,10,1 Baker St
Alice,12,3 Main St
Becky,13,156 F St
Ben,14,2 18th St
Cameron,15,4 Journey Road
Danny,16,123 North Ave
Eric,17,325 Hill Blvd
,,
Section 2,,
Abby,10,1 Baker St
Alice,12,3 Main St
Becky,13,156 F St
Ben,14,2 18th St
,,
Section 3,,
Becky,13,156 F St
Ben,14,2 18th St
Cameron,15,4 Journey Road
Danny,16,123 North Ave
,,
Section 5,,
Alice,12,3 Main St
Becky,13,156 F St
Ben,14,2 18th St
Cameron,15,4 Journey Road
Danny,16,123 North Ave
Eric,17,325 Hill Blvd

Expected output

Name,Age,Address
Section 1,,
Abby,10,1 Baker St
Alice,12,3 Main St
Becky,13,156 F St
Ben,14,2 18th St
Cameron,15,4 Journey Road
Danny,16,123 North Ave
Eric,17,325 Hill Blvd

Solution

  • Assuming your text file is called temp.txt you can use readLines to read it in, find the line with 'Section 2' in it and read all the lines above that.

    tmp <- readLines('temp.txt')
    inds <- grep('Section 2', tmp) - 2
    data <- read.csv(text = paste0(tmp[1:inds], collapse = '\n'))
    data
    #       Name Age        Address
    #1 Section 1  NA               
    #2      Abby  10     1 Baker St
    #3     Alice  12      3 Main St
    #4     Becky  13       156 F St
    #5       Ben  14      2 18th St
    #6   Cameron  15 4 Journey Road
    #7     Danny  16  123 North Ave
    #8      Eric  17  325 Hill Blvd