Search code examples
shellgrepashpcregrep

shell - How to match content between xml tags?


I have this file:

<?xml version="1.0" encoding="utf-8"?>
<response>
        <Count>1</Count>
        <Messages>
                <Message>
                        <Smstat>0</Smstat>
                        <Index>40001</Index>
                        <Phone>234</Phone>
                        <Content>Poin Bonstri kamu: 358

Sisa Kuota kamu :
Kuota WA.Line 18 MB s.d 06&#x2F;08&#x2F;2019 19:33:46
Kuota Reguler 1478 MB s.d 02&#x2F;08&#x2F;2019 05:36:44
Temukan beragam paket lain di bima+ https:&#x2F;&#x2F;goo.gl&#x2F;RQ1DBA</Content>
                        <Date>2019-08-01 13:28:04</Date>
                        <Sca></Sca>
                        <SaveType>4</SaveType>
                        <Priority>0</Priority>
                        <SmsType>2</SmsType>
                </Message>
        </Messages>
</response>

I want to match the text between <Content> and </Content>. I've tried:

tr '\n' ' ' < input_file | grep -E "^<Content>.*</Content>$"

But it doesn't work. Please note that I use ash shell instead of bash. How do I do this ?


Solution

  • If you have PCRE capable grep you could use positive lookahead and -behind:

    $ tr '\n' ' ' < file | grep -Po "(?<=<Content>).*(?=</Content>)"
    

    Output:

    Poin Bonstri kamu: 358  Sisa Kuota kamu : Kuota WA.Line 18 MB s.d 06&#x2F;08&#x2F;2019 19:33:46 Kuota Reguler 1478 MB s.d 02&#x2F;08&#x2F;2019 05:36:44 Temukan beragam paket lain di bima+ https:&#x2F;&#x2F;goo.gl&#x2F;RQ1DBA