Search code examples
xmlregexbbedit

Need regex help to modify an XML file


I'm trying to modify an XML file which contains elements holding opening times for branches of a business. The XML file is inconsistent because for some branches it has just an opening time and a closing time, others have an opening time, a closing time for lunch, a post-lunch opening time and a closing time.

Examples of both types below:

<monday>10.00,17.00</monday>
<monday>09.00,12.30,13.30,17.00</monday>

I want to reformat these strings to a better format such as the ones below:

<monday>
  <open>10.00</open>
  <lunch></lunch>
  <close>17.00</close>
</monday>

<monday>
  <open>09.00</open>
  <lunch>12.30 - 13.30</lunch>
  <close>17.00</close>
</monday>

I've been trying to use BBEdit regular expressions on my Mac to make the changes but I'm having difficulty, specifically I think because I'm not sure how I can get the regular expression to replace a subset of the text I tell it to match on. For example, in pseudo code I want the regular expression to do this:

replace <monday>time1,time2</monday>
with <monday><open>time1</open><lunch></lunch><close>time2</close></monday>

replace <monday>time1,time2,time3,time4</monday>
with <monday><open>time1</open><lunch>time2 - time3</lunch><close>time4</close></monday>

I'm not too familiar with regular expressions so I'm making some errors I'm sure but so far I've been trying the below:

replace >#+\.#+,#+\.#+< with ><open>#+\.#+<open><lunch></lunch><close>#+.\#+<

I understand this isn't going to work anyway because I'm telling the regex to replace the numbers it matches with #+ with the strings '#+' etc.

How can I achieve what I want to do by regex or other means and also how to I tell the regular expression to use an expression for comparison but only replace a subset of the characters it matches?


Solution

  • Well I figured it out quicker than I expected. Here are the expressions I used:

    I used the following find string:

    (<[a-z]+day>)([0-9]+\.[0-9]+),([0-9]+\.[0-9]+)(</[a-z]+day>)
    

    ...and the following replace string:

    \1<open>\2</open><lunch></lunch><close>\3</close>\4
    

    to match the following lines:

    <monday>10.00,17.00</monday>
    

    which resulted in the following output:

    <monday><open>10.00</open><lunch></lunch><close>17.00</close></monday>