I would like to match all lines (including the first line) between two lines that start with 'SLX-', convert them to a comma separated line and then append them to a text file.
A truncated version of the original text file looks like:
SLX-9397._TC038IV_L_FLD0214.Read1.fq.gz
Sequences: 1406295
With index: 1300537
Sufficient length: 1300501
Min index: 0
Max index: 115
0 1299240
1 71
2 1
4 1
Unique: 86490
# reads processed: 86490
# reads with at least one reported alignment: 27433 (31.72%)
# reads that failed to align: 58544 (67.69%)
# reads with alignments suppressed due to -m: 513 (0.59%)
Reported 27433 alignments to 1 output stream(s)
SLX-9397._TC044II_D_FLD0197.Read1.fq.gz
Sequences: 308905
With index: 284599
Sufficient length: 284589
Min index: 0
Max index: 114
0 284290
1 16
Unique: 32715
# reads processed: 32715
# reads with at least one reported alignment: 13114 (40.09%)
# reads that failed to align: 19327 (59.08%)
# reads with alignments suppressed due to -m: 274 (0.84%)
Reported 13114 alignments to 1 output stream(s)
SLX-9397._TC047II_D_FLD0220.Read1.fq.gz
I imagine the ruby would look like
I think I specifically have a problem with how to find and replace between two specific lines.
I guess I could do this without using ruby, but seeing as I'm trying to get into Ruby...
Assuming, that you have your string in str
:
require 'csv'
CSV.open("/tmp/file.csv", "wb") do |csv|
str.scan(/^(SLX-.*?)(?=\R+SLX-)/m).map do |s| # break by SLX-
s.first.split($/).map do |el| # split by CR
"'#{el}'" # quote values
end
end.each do |line| # iterate
csv << line # fulfil csv
end
end