I'm trying to pull the alt tag from an image from an xml node titled <description>
. Here is the content of the node <img src="xxx.png" alt="WHAT I WANT TO GRAB">
. I've tried to create a php function to extract that but it's not working. Where am I going wrong? For reference here is the xml feed.
What I'm using to set Alt tag in the import:
[get_alt_tags_from_xml({description[1]})]
What I'm using for a function:
function get_alt_tags_from_xml($content) {
//The content
$html = file_get_html($content);
//Run on all images
foreach($html->find('img') as $element)
echo $element->alt . ', ';
}
Consider using DOMDocument
and DOMXPath
. Load html and find img
tags with xpath, then extract all attributes. It would be better to use PHP_EOL
instead of ,
to join the values.
The function should support extracting both src
and alt
attributes to be able to provide two lists of the same size representing the images. WP All Import will use that information as an image data source and for alternative text field population.
function get_img_attrs_from_html( $content, $attribute_name ) {
if ( empty( $content ) || empty( $attribute_name ) ) {
return '';
}
$dom = new DOMDocument;
@$dom->loadHTML( $content );
$dxp = new DOMXPath( $dom );
$images = $dxp->query( '//img' );
$values = array_map( function( $img ) use ( $attribute_name ) {
return trim( $img->getAttribute( $attribute_name ) );
}, iterator_to_array( $images ) );
return join( PHP_EOL, $values );
}
The content_encoded
item node from Simplifyingthemarket feed is preferable as it contains more images than description
tag.
Under Images
section of WP All Import's Edit Template page:
Download images hosted elsewhere
should be selected and contain the source list:Enter image URL one per line, or separate them with a
must be blank.[get_img_attrs_from_html({content_encoded[1]},"src")]
Scan through post content and import images wrapped in <img> tags
must be enabled.SEO & Advanced Options
-> Set Alt Text(s)
should be selected and provide the alternative text list:Enter one per line, or separate them with a
must be blank.[get_img_attrs_from_html({content_encoded[1]},"alt")]
Once these settings are saved the import can be started.
The attribute extraction function can be tested outside of an import:
$item = <<<XML
<item><content_encoded><![CDATA[
text
<img src="x.png" alt="WHAT I WANT TO GRAB">
<p>
<img src="no-alt.png">
<strong>tag</strong>
<img src="y.png" alt="ANOTHER, AN ATTRIBUTE, SHOULD BE GRABBED">
</p>
]]></content_encoded></item>
XML;
$idom = new DOMDocument;
@$idom->loadXML( $item );
$cnode = ( new DOMXPath( $idom ) )->query( '//content_encoded' );
$content = $cnode->item( 0 )->textContent;
echo get_img_attrs_from_html( $content, 'src' ) . PHP_EOL;
echo get_img_attrs_from_html( $content, 'alt' ). PHP_EOL;
x.png
no-alt.png
y.png
WHAT I WANT TO GRAB
ANOTHER, AN ATTRIBUTE, SHOULD BE GRABBED