I am trying to come up with a regex for an XML document which is essentially a DASH mpd file. Use case is that this XML document has AdaptationSet tag which in-turn can have multiple Representation tags as shown. I need to match all Representation tag which have bandwidth attribute more than the specified input i.e 2000000 or 4000000 shown below. I could come up with the following one but it doesn't address the case when attributes span multiple lines as shown in Representation with id=1.
RANGE in regex can take any value from 1-9 which can be assumed to be in integer format ready to be consumed by regex. RANGE with following 6 digits will make the match to be made for bandwidth value of 1000000 or 2000000 or 3000000 and so on based on whether value of RANGE is 1 or 2 or 3 respectively.
regex:
<[Rr]epresentation.*?[Bb]andwidth="0?[%(RANGE)]\d{6}"[\s\S]*?[Rr]epresentation>
<AdaptationSet segmentAlignment="true" maxWidth="1280" maxHeight="720" maxFrameRate="24" par="16:9">
<Representation id="1"
mimeType="video/mp4"
codecs="avc1.4d401f"
width="512"
height="288"
frameRate="24"
sar="1:1"
startWithSAP="1"
bandwidth="1000000">
<SegmentTemplate timescale="12288" duration="61440" media="BBB_512_640K_video_$Number$.mp4" startNumber="1" initialization="BBB_512_640K_video_init.mp4" />
</Representation>
<Representation id="2" mimeType="video/mp4" codecs="avc1.4d401f" width="512" height="288" frameRate="24" sar="1:1" startWithSAP="1" bandwidth="2000000">
<SegmentTemplate timescale="12288" duration="61440" media="BBB_512_640K_video_$Number$.mp4" startNumber="1" initialization="BBB_512_640K_video_init.mp4" />
</Representation>
<Representation id="3" mimeType="video/mp4" codecs="avc1.4d401f" width="768" height="432" frameRate="24" sar="1:1" startWithSAP="1" bandwidth="4000000">
<SegmentTemplate timescale="12288" duration="61440" media="BBB_768_1440K_video_$Number$.mp4" startNumber="1" initialization="BBB_768_1440K_video_init.mp4" />
</Representation>
</AdaptationSet>
you can use this regex
<[Rr]epresentation[^>]*?[Bb]andwidth="0?[2-9]\d{6}"[\s\S]*?[Rr]epresentation>