Search code examples
regexxmlwildcardkettle

Regular expression to delete XML element names


I have a situation. In order to develop one quite complex XML, I have used "place-holders". Once my XML is ready, I need to delete those 'place-holders'.

Sample Input

<consumers>
  <place-holder_1>
    <consumer>
      <val>1</val>
    </consumer>
  </place-holder_1>
  <place-holder_2>
    <consumer-info>
      <val>2</val>
    </consumer-info>
  </place-holder_2>
</consumers>

Sample Output

<consumers>
  <consumer>
    <val>1</val>
  </consumer>
  <consumer-info>
    <val>2</val>
  </consumer-info>
</consumers>

Basically, I am looking for a regex which can delete all tags containing anything with "place-holder" in a generic way. Any number between 1 to 10 can be suffix of 'place-holder' tag.

I am struggling to come up with regex for this.


Solution

  • The following regex captures the desired nodes

    ^\s*<\/?place-holder_\d{1,2}>

    Once captured, you can replace the first capturing group with empty string.