Search code examples
phpwpallimport

ALT Tag in WP All Import Using PHP Function


I'm trying to pull the alt tag from an image from an xml node titled <description>. Here is the content of the node <img src="xxx.png" alt="WHAT I WANT TO GRAB">. I've tried to create a php function to extract that but it's not working. Where am I going wrong? For reference here is the xml feed.

What I'm using to set Alt tag in the import:

[get_alt_tags_from_xml({description[1]})]

What I'm using for a function:

function get_alt_tags_from_xml($content) {
    //The content
    $html = file_get_html($content);
    //Run on all images
    foreach($html->find('img') as $element)
    echo $element->alt . ', ';
    }

Solution

  • Function

    Consider using DOMDocument and DOMXPath. Load html and find img tags with xpath, then extract all attributes. It would be better to use PHP_EOL instead of , to join the values.

    The function should support extracting both src and alt attributes to be able to provide two lists of the same size representing the images. WP All Import will use that information as an image data source and for alternative text field population.

    function get_img_attrs_from_html( $content, $attribute_name ) {
        if ( empty( $content ) || empty( $attribute_name ) ) {
            return '';
        }
        $dom = new DOMDocument;
        @$dom->loadHTML( $content );
    
        $dxp = new DOMXPath( $dom );
        $images = $dxp->query( '//img' );
    
        $values = array_map( function( $img ) use ( $attribute_name ) {
            return trim( $img->getAttribute( $attribute_name ) );
        }, iterator_to_array( $images ) );
    
        return join( PHP_EOL, $values );
    }
    

    Template

    The content_encoded item node from Simplifyingthemarket feed is preferable as it contains more images than description tag.

    Under Images section of WP All Import's Edit Template page:

    1. Download images hosted elsewhere should be selected and contain the source list:
    • Enter image URL one per line, or separate them with a must be blank.
    • Value for main input:
    [get_img_attrs_from_html({content_encoded[1]},"src")]
    
    1. Scan through post content and import images wrapped in <img> tags must be enabled.
    2. SEO & Advanced Options -> Set Alt Text(s) should be selected and provide the alternative text list:
    • Enter one per line, or separate them with a must be blank.
    • Main input value:
    [get_img_attrs_from_html({content_encoded[1]},"alt")]
    

    Once these settings are saved the import can be started.

    Test

    The attribute extraction function can be tested outside of an import:

    $item = <<<XML
    <item><content_encoded><![CDATA[
    text
    <img src="x.png" alt="WHAT I WANT TO GRAB">
    <p>
        <img src="no-alt.png">
        <strong>tag</strong>
        <img src="y.png" alt="ANOTHER, AN ATTRIBUTE, SHOULD BE GRABBED">
    </p>
    ]]></content_encoded></item>
    XML;
    $idom = new DOMDocument;
    @$idom->loadXML( $item );
    $cnode = ( new DOMXPath( $idom ) )->query( '//content_encoded' );
    $content = $cnode->item( 0 )->textContent;
    
    echo get_img_attrs_from_html( $content, 'src' ) . PHP_EOL;
    echo get_img_attrs_from_html( $content, 'alt' ). PHP_EOL;
    
    x.png
    no-alt.png
    y.png
    WHAT I WANT TO GRAB
    
    ANOTHER, AN ATTRIBUTE, SHOULD BE GRABBED