Search code examples
rubyregextextmate

Matching all lines between two lines recursively in ruby


I would like to match all lines (including the first line) between two lines that start with 'SLX-', convert them to a comma separated line and then append them to a text file.

A truncated version of the original text file looks like:

SLX-9397._TC038IV_L_FLD0214.Read1.fq.gz
Sequences: 1406295
With index: 1300537
Sufficient length: 1300501
Min index: 0
Max index: 115
0       1299240
1       71
2       1
4       1
Unique: 86490
# reads processed: 86490
# reads with at least one reported alignment: 27433 (31.72%)
# reads that failed to align: 58544 (67.69%)
# reads with alignments suppressed due to -m: 513 (0.59%)
Reported 27433 alignments to 1 output stream(s)
SLX-9397._TC044II_D_FLD0197.Read1.fq.gz
Sequences: 308905
With index: 284599
Sufficient length: 284589
Min index: 0
Max index: 114
0       284290
1       16
Unique: 32715
# reads processed: 32715
# reads with at least one reported alignment: 13114 (40.09%)
# reads that failed to align: 19327 (59.08%)
# reads with alignments suppressed due to -m: 274 (0.84%)
Reported 13114 alignments to 1 output stream(s)
SLX-9397._TC047II_D_FLD0220.Read1.fq.gz

I imagine the ruby would look like

  1. Convert all /n between two lines with SLX- to commas
  2. Save the original text file as a new text file (or even better a CSV file.

I think I specifically have a problem with how to find and replace between two specific lines.

I guess I could do this without using ruby, but seeing as I'm trying to get into Ruby...


Solution

  • Assuming, that you have your string in str:

    require 'csv'
    CSV.open("/tmp/file.csv", "wb") do |csv|
      str.scan(/^(SLX-.*?)(?=\R+SLX-)/m).map do |s| # break by SLX-
        s.first.split($/).map do |el|               # split by CR
          "'#{el}'"                                 # quote values
        end                           
      end.each do |line|                            # iterate
        csv << line                                 # fulfil csv
      end
    end