Search code examples
phpxmlstomp

PHP remove repeatedly remove lines after specific text to find XML


I have code to retrieve stomp messages, which works. I then want to grab the xml out of the stomp message to do stuff with, which I have code for and it works.

The challenge is to strip out the cruft from the message and get only the xml.

Here is a sample of the stomp message (yes the data is the same, but that's not relevant here):

MESSAGE
_HQ_ORIG_ADDRESS:jms.queue.edu
timestamp:1339716293764
redelivered:false
_HQ_ORIG_MESSAGE_ID:xxxxxxxx
expires:0
subscription:subscription/jms.queue.edu
priority:4
message-id:xxxxxxxxxx
destination:jms.queue.edu

<?xml version="1.0" encoding="UTF-8" standalone="yes"?><create><sourceMessageId>4454</sourceMessageId><messageId>3038</messageId><course> <batchUid>ASIA.355.921.2012S1.6733</batchUid><title>ASIA355-921-Chinese Cinema</title><startDate>2012-06-18-07:00</startDate><endDate>2012-09- 21-07:00</endDate><mappedNodeBatchUid>9c0bc373-23a0-4e60-b201- efbbc9bb022e</mappedNodeBatchUid><available>false</available></course></create>
MESSAGE
_HQ_ORIG_ADDRESS:jms.queue.edu
timestamp:1339716293764
redelivered:false
_HQ_ORIG_MESSAGE_ID:xxxxxxxx
expires:0
subscription:subscription/jms.queue.edu
priority:4
message-id:xxxxxxxxxx
destination:jms.queue.edu

<?xml version="1.0" encoding="UTF-8" standalone="yes"?><create><sourceMessageId>4454</sourceMessageId><messageId>3038</messageId><course> <batchUid>ASIA.355.921.2012S1.6733</batchUid><title>ASIA355-921-Chinese Cinema</title><startDate>2012-06-18-07:00</startDate><endDate>2012-09- 21-07:00</endDate><mappedNodeBatchUid>9c0bc373-23a0-4e60-b201- efbbc9bb022e</mappedNodeBatchUid><available>false</available></course></create>

What I want to do is remove all the lines in the message starting from "MESSAGE" up to and including the line break before each line that starts with the xml. This will give me the results required to parse using an xml parser:

    <?xml version="1.0" encoding="UTF-8" standalone="yes"?><create><sourceMessageId>4454</sourceMessageId><messageId>3038</messageId><course> <batchUid>ASIA.355.921.2012S1.6733</batchUid><title>ASIA355-921-Chinese Cinema</title><startDate>2012-06-18-07:00</startDate><endDate>2012-09- 21-07:00</endDate><mappedNodeBatchUid>9c0bc373-23a0-4e60-b201- efbbc9bb022e</mappedNodeBatchUid><available>false</available></course></create>
    <?xml version="1.0" encoding="UTF-8" standalone="yes"?><create><sourceMessageId>4454</sourceMessageId><messageId>3038</messageId><course> <batchUid>ASIA.355.921.2012S1.6733</batchUid><title>ASIA355-921-Chinese Cinema</title><startDate>2012-06-18-07:00</startDate><endDate>2012-09- 21-07:00</endDate><mappedNodeBatchUid>9c0bc373-23a0-4e60-b201- efbbc9bb022e</mappedNodeBatchUid><available>false</available></course></create>

I tried:

$xmlstr = preg_replace("/MESSAGE(.*)jms.queue.edu$/ims",'',$msg);
$xmlstr = trim($xmlstr);

But that removes everything between the first occurrence of "MESSAGE" on the first line, and the last occurrence of the xml. In other words, all lines between the first "MESSAGE" and the last "xml" are removed.

Any ideas? I've tried using a variety of tricks including; regex, implode/explode, writing/reading to a file, etc. But I feel the above preg_replace code works, it just needs to be able to recognize ALL occurrences. I know it will involve either a "while" or "foreach" loop, but I'm looking forward to a nice, clean solution. Any help is most appreciated.


Solution

  • Use ? after the *.

    Alternatively, try this:

    list(,$body) = explode("\r\n\r\n",$msg); // adjust line ending as needed
    list($xmlstr) = explode("\r\n",$body);
    

    This will get the line that contains all the XML.