Search code examples
xmlbashparsingopenwrt

Bash - How to get multi line text between XML tags


I have a text file...

# <?xml version="1.0" encoding="UTF-8"?>
<response>
<content>Pulsa:Rp200,Bonus:0 s&#x2F;d 12-JUL-17. 1GB Rp10rb.Mau?
1. Mau
2. Info
3. Internet
4. RAMADHAN HOTSALE
5. Nelpon
6. SMS
7. BB
8. NEW:UNLIMITED INTERNET
9. Roaming
10. 100MB2K</content>
</response>

and I want to extract the text between <content> up to </content>. I have tried:

grep -oP '(?<=<content> ).*?(?= </content>)' file

But it doesn't output anything, I want the end result to be like this:

Pulsa:Rp200,Bonus:0 s&#x2F;d 12-JUL-17. 1GB Rp10rb.Mau?
1. Mau
2. Info
3. Internet
4. RAMADHAN HOTSALE
5. Nelpon
6. SMS
7. BB
8. NEW:UNLIMITED INTERNET
9. Roaming
10. 100MB2K

How can I do this?


Solution

  • With GNU grep and Perl regular expression (-P):

    grep -Poz '(?<=<content>)(.*\n)*.*(?=</content>)' file.xml
    

    Output:

    Pulsa:Rp200,Bonus:0 s/d 12-JUL-17. 1GB Rp10rb.Mau?
    1. Mau
    2. Info
    3. Internet
    4. RAMADHAN HOTSALE
    5. Nelpon
    6. SMS
    7. BB
    8. NEW:UNLIMITED INTERNET
    9. Roaming
    10. 100MB2K