Search code examples
phpfilterxml-parsingrss

PHP Filter to filter parsed RSS/XML


I'm struggling getting a parsed rss feed to filter on a few values within the description. Its a feed for traffic-information and i want to filter the feed to only display some roads instead of all. So i need to filter the feed to ONLY show items that have roadnr "A2" or "A4" or "N15" and so on.

Here's my code so far (dont mind the table part :-) )

Any ideas that this noob can follow? Thanks in advance!

Bonuspoints for a "No results" message if no items (empty list)...

<?php
           $ch = curl_init("https://www.verkeerplaza.nl/rssfeed");
           curl_setopt($ch, CURLOPT_HEADER, 1);  
           curl_setopt($ch,CURLOPT_TIMEOUT, 30);  
           curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
           curl_setopt ($ch, CURLOPT_SSL_VERIFYPEER, 0);  
           curl_setopt ($ch, CURLOPT_SSL_VERIFYHOST, 0);  

           $result=curl_exec ($ch);
           $data = strstr($result, '<?');  
           $xml = new SimpleXMLElement($data);  

 curl_close($curl);


$xml = simplexml_load_string($data, 'SimpleXMLElement', LIBXML_NOCDATA);
//die('<pre>' . print_r($xml], TRUE) . '</pre>');



$lastUpdate = $xml->channel->pubDate;

echo $lastUpdate;
echo "<table>";
echo "<tbody>";


for($i = 0; $i <5 ; $i++){
    
    $pubDate = $xml->channel->item[$i]->pubDate;
    $title = $xml->channel->item[$i]->title;
    $description = $xml->channel->item[$i]->description;
    $enclosure = $xml->channel->item[$i]->enclosure['url'];

    echo "<tr>";
      echo "<td>
            <img src='$enclosure' width='50'>
            </td>
            <td>
            &nbsp;&nbsp;
            </td>";
      echo "<td>
        <b>$title</b><br/>
        <small>$pubDate</small><br/>
        $description<br/><br/>
            </td>";
    echo "</tr>";
}
  echo "</tbody>";
echo "</table>";
?>


Solution

  • The most important changes you need to make are the addition of a way to filter the items based upon your list of roads, and changing your loop through the items so that it does not used a fixed number of iterations.

    From there, we can make a number of small improvements to make the code mor readable and manageable. This is what I would consider a minimum viable solution. You could spend a lot of time making the templating and road matching more sophisticated, but this gets the job done, assuming that the road names always begin with your identifiable strings (A2, A4, N15, etc.)

    <?php
    // Test data
    $data = <<<END
    <?xml version="1.0" encoding="utf-8"?>
    <rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
        <channel>
            <title>Verkeerplaza.nl RSS feed</title>
            <link>https://www.verkeerplaza.nl</link>
            <description>Alle actuele files en verkeersinformatie van Verkeerplaza.nl</description>
            <atom:link href="https://www.verkeerplaza.nl/rssfeed" rel="self" type="application/rss+xml"/>
            <pubDate>2021-06-16T19:13:21+00:00</pubDate>
            <copyright>Copyright (c) 2019 -2021, Moop Mobility B.V.</copyright>
            <atom:logo>https://www.verkeerplaza.nl/favicon.png</atom:logo>
            <logo>https://www.verkeerplaza.nl/favicon.png</logo>
            <item>
                <title>A67 Eindhoven &gt; Venlo Tussen Afrit Asten en Brug Brug over de Zuid-Willemsvaart</title>
                <link>https://verkeerplaza.nl/files/A67</link>
                <pubDate>2021-06-16T21:12:00+02:00</pubDate>
                <description>Ongeluk tussen Afrit Asten en Brug Brug over de Zuid-Willemsvaart</description>
                <enclosure length="0" type="image/jpeg" url="https://verkeerplaza.nl/images/map-icons/red/[email protected]"/>
            </item>
            <item>
                <title>A2 Amsterdam &gt; Utrecht Tussen Afrit Ouderkerk aan de Amstel en Afrit Amsterdam-Zuidoost</title>
                <link>https://verkeerplaza.nl/files/A2</link>
                <pubDate>2021-06-16T21:43:42+02:00</pubDate>
                <description>Ongeluk tussen Afrit Ouderkerk aan de Amstel en Afrit Amsterdam-Zuidoost</description>
                <enclosure length="0" type="image/jpeg" url="https://verkeerplaza.nl/images/map-icons/red/[email protected]"/>
            </item>
            <item>
                <title>A3 Eindhoven &gt; Foo</title>
                <link>https://verkeerplaza.nl/files/A67</link>
                <pubDate>2021-06-16T21:12:10+02:00</pubDate>
                <description>Brug over de Zuid-Willemsvaart</description>
                <enclosure length="0" type="image/jpeg" url="https://verkeerplaza.nl/images/map-icons/red/[email protected]"/>
            </item>
        </channel>
    </rss>
    END;
    
    // Load XML into SimpleXMLElement
    $xml = simplexml_load_string($data, 'SimpleXMLElement', LIBXML_NOCDATA);
    
    /*
     * Define a template for our row markup, insert token strings for the values
     */
    $rowTemplate = <<<END
    <tr>
        <td><img src="%IMG%" width="50"></td>
        <td>&nbsp;&nbsp;</td>
        <td>
            <b>%ROAD%</b><br/>
            <small>%DATE%</small><br/>
            %DESC%
            <br/><br/>
        </td>
    </tr>
    END;
    
    // Create an array of the tokens used int ht template
    $tokens = ['%IMG%', '%ROAD%', '%DATE%', '%DESC%'];
    
    // Create an array of valid road names
    $targetRoads = ['A67', 'A2'];
    
    // String buffer for the row markup
    $rowMarkup = '';
    foreach($xml->channel->item as $currItem)
    {
        /*
         * Create an array of the values form the item that should be used
         * to replace tokens in the row template. This approach requires
         * the order of the values in this array to match the order of tokens
         * in the $tokens array. We can think about enhancing the later, but
         * it works for now.
         */
        $values = [
            $currItem->enclosure['url'],
            $currItem->title,
            $currItem->pubDate,
            $currItem->description
        ];
    
        // Create a flag to indicate whether or not the current item should be output as a row
        $valid = false;
    
        // Loop through the target road values
        foreach($targetRoads as $currRoad)
        {
            /*
             * The target road values are strings that should appear at the beginning
             * of the item title. The format of the feed is expected to consistently
             * include these strings at the beginning of each title.
             *
             * Look for the string at the beginning of the title, followed by a
             * whitespace character. This way, target "A2" returns "A2 Amsterdam"
             * but not "A25 Amsterdam"
             */
            preg_match("/^$currRoad\s/", $currItem->title, $matches);
    
            // If $matches is not empty, we have a good hit
            if(!empty($matches))
            {
                // Set the valid flag and break out of the search loop.
                $valid = true;
                break;
            }
        }
    
        // If the current item contains a road we are interested in, output a row
        if($valid)
        {
            // Use simple string replacement to generate markup from the row template
            $rowMarkup .= str_replace($tokens, $values, $rowTemplate);
        }
    }
    
    // Row to display in the table if no results are found.
    $noResultsRow = <<<END
    <tr>
        <td>No results</td>
    </tr>
    END;
    
    // If we have no results, assign the value of $noResultsRow the $rowMarkup var
    $rowMarkup = (empty($rowMarkup)) ? $noResultsRow:$rowMarkup;
    
    // Format the pub date of the feed into something pretty
    $lastUpdate = $xml->channel->pubDate;
    $formattedDate = date('F j, Y g:i A', strtotime($lastUpdate));
    
    // Create the final markup
    $output = <<<END
    <p>Last update: $formattedDate</p>
    <table>
        <tbody>
            $rowMarkup
        </tbody>
    </table>
    END;
    
    echo $output.PHP_EOL;