Search code examples
ruby-on-railsrubyregexroman-numerals

Regular expression to extract post number and data from string


Hey I have a huge text which goes like this

  some_data I POST postdata_1 IV POST postdata_4 III POST postdata_3 II POST postdata_2

So the post data has a the corresponding post number in Roman Numeral before the word 'POST'.

I want to put this into tags as

    <post number>
      I
    </post number>

    <post data>
      post_data1
    </post data>

And so on for every post..

Can someone help me out with a regular expression for this? I'm using Ruby


Solution

  • If I understand well, this will work how you expect:

    roman_number = /M{0,3}(?:CM|CD|D?C{0,3})(?:XC|XL|L?X{0,3})(?:IX|IV|V?I{0,3})/
    regex = /(#{roman_number})\sPOST\s(.+?)(?=\s#{roman_number}\sPOST|$)/
    str.scan(regex) do |post_number, post_data|
      ...
    end
    

    Roman numbers regex by paxdiablo, here.