I'm trying to use Ruby and Regex to divide a long string into chunks separated by timestamps that occur throughout the string.
"10:59 a.m. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus at tincidunt >ante. 3:30 a.m. Aenean interdum, quam sed tempor imperdiet, neque turpis aliquet est, at >luctus justo arcu et arcu. Sed sit amet eros a sem hendrerit vestibulum faucibus sit amet >nunc. Nam venenatis pharetra leo vel facilisis. 9:20 p.m. Aenean tincidunt ligula lacinia."
Here's the loop I'm running to pull out each chunk.
while text.length > 1
begin_entry = text.index(/\d{1,2}[:]\d{2}\s(a|p)[.][m][.]/)
end_entry = text.index(/\d{1,2}[:]\d{2}\s(a|p)[.][m][.]/, begin_entry + 1)
blot = text.slice!(begin_entry, end_entry)
end
When I run this, the first timestamp to begin the entry is captured find. However, the end is never right.
Instead of "10:59 a.m. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus at tincidunt ante." I get ""10:59 a.m. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus at tincidunt ante. 3:30 a."
And things get even more off as the loop runs through the string. The beginning of the entry is always correct with the timestamp included at the beginning of the substring. The end never is, however.
text.split(/(\d{1,2}:\d{1,2}\s[ap]\.m\.)/).drop(1).each_slice(2).map(&:join)