With PHP, in HTML file, I want to remove the CDATA
blocks inside a script element.
<script type="text/javascript">
/* <![CDATA[ */
var A=new Array();
..........................
..........................
/* ]]> */
</script>
some text2 ........................
some text3 ........................
some text4 ........................
<script type="text/javascript">
/* <![CDATA[ */
var B=new Array();
..........................
..........................
/* ]]> */
some text5 ........................
I haven't found how to select & remove this nodes with XPath & PHP DomDocument.
I tried with this regular expression
$re = '/\/\*\s*<!\[CDATA\[[\s\S]*\/\*\s*\]\]>\s*\*\//i';
But this removes all text including the one between 2 blocks of CDATA.
As a result I get an empty string instead of
some text2 ........................
some text3 ........................
some text4 ........................
some text5 ........................
Any ideas?
Update with ThW solution :
With this page, It seems that the text of the CDATA section is not well parsed
libxml_use_internal_errors(true);
$domDoc = new DOMDocument();
$domDoc->loadHTMLFile('https://www.maisons-qualite.com/le-reseau-mdq/recherche-constructeurs-agrees/construction-maison-neuve-centre-val-loire');
libxml_clear_errors();
$xpath = new DOMXpath($domDoc);
foreach($xpath->evaluate('//text()') as $section) {
if ($section instanceof DOMCDATASection) {
print_r($section->textContent);
$section->parentNode->removeChild($section);
}
}
$content = $domDoc->saveHTML();
I got this textContent
.....
.....
function updateConstructeurs(list) {
for (var i in list) {
if(list[i]['thumbnail']) {
jQuery('#reseau-constructeurs').append('<div class="reseau-constructeur">' +
'<div class="img" style="background-image:url(' + list[i]['thumbnail'] + ')">
for
function updateConstructeurs(list) {
for (var i in list) {
if(list[i]['thumbnail']) {
jQuery('#reseau-constructeurs').append('<div class="reseau-constructeur">' +
'<div class="img" style="background-image:url(' + list[i]['thumbnail'] + ')"></div>' +
'<h3>' + list[i]['title'] + '</h3>' +
'<a class="btn purple" href="' + list[i]['link'] + '">Accéder à la fiche</a>' +
'</div>');
}
}
}
And as a result, instead of getting an empty string, we have :
'<h3>' + list[i]['title'] + '</h3>' +
'<a class="btn purple" href="'%20+%20list%5Bi%5D%5B'link'%5D%20+%20'">Accéder à la fiche</a>' +
'</div>');
}
}
}
/* ]]> */
Make the [\s\S]*
non-greedy, i.e. [\s\S]*?
:
\/\*\s*<!\[CDATA\[[\s\S]*?\/\*\s*\]\]>\s*\*\/