Is there away to remove unwanted text when using (getElementsByTagName) for example.
This gets the published date for the movie for my site
$spans = $dom->getElementsByTagName('span');
for($i=0; $i <$spans-> length; $i++){
$itemprop = $spans->item($i)->getAttribute("itemprop");
if ($itemprop == "datePublished"){
if ($spans->item($i)->textContent!='-'){
$res['published'] = trim($spans->item($i)->textContent);
}
}
}
But what happens is instead of getting this.
12 July 2011
It gets this instead.
12 July 2011 10:47 PM, UTC
So is any code i could add to remove this part.
10:47 PM, UTC
You could use a regular expression to pull out the value:
preg_match('/^\d+ \w+ \d+/', $spans->item($i)->textContent, $matches);
list(, $published_date) = $matches;
Assuming the format of the date doesn't change you shouldn't have a problem. A much better idea however would be parsing it with DateTime::createFromFormat though. This should be correct:
$published_date = DateTime::createFromFormat("d M Y h:i A, e", $spans->item($i)->textContent);
Edit: Updated original code from question with recommended changes:
$spans = $dom->getElementsByTagName('span');
for($i=0; $i < $spans->length; $i++){
$itemprop = $spans->item($i)->getAttribute("itemprop");
if ($itemprop == "datePublished"){
if ($spans->item($i)->textContent!='-'){
$text_content = trim($spans->item($i)->textContent);
$published_date = DateTime::createFromFormat("d M Y h:i A, e", $text_content);
$res['published'] = $published_date->format("d M Y");
}
}
}