Search code examples
phpxmlsimplexmlcdata

how to read this xml, get "parser error : CData section not finished"


im trying to read this xml: xml rss file

but with not success.. have this error

    Warning: simplexml_load_file(): http://noticias.perfil.com/feed/:232: parser error : CData section not finished <p>La sola lectura de los datos estadísticos desp in D:\xampp\FerreWoo\scrap-rvnot.php on line 43

    Warning: simplexml_load_file(): Isis, con lo que habría logrado un nuevo respaldo a sus proyectos terroristas. in D:\xampp\FerreWoo\scrap-rvnot.php on line 43

    Warning: simplexml_load_file(): ^ in D:\xampp\FerreWoo\scrap-rvnot.php on line 43

Im using this code:

   $feed = simplexml_load_file($urls, null, LIBXML_NOCDATA);

I try cURL too but the same erros still comming.

I know that maybe de xml file is incorrect... but there must be a way to read it, right?


Solution

  • You have some invalid characters on that XML. Try this code below

    $url    = 'http://noticias.perfil.com/feed/';
    $html   = file_get_contents($url);
    $invalid_characters = '/[^\x9\xa\x20-\xD7FF\xE000-\xFFFD]/';
    $html = preg_replace($invalid_characters, '', $html);
    
    $xml = simplexml_load_string($html);
    
    //test purpose part 
    $encode = json_encode($xml);
    $decode = json_decode($encode, true);
    print_r($decode);
    

    Hope it helps