Search code examples
phphtmlwordpressiframepreg-match-all

Function to get the embedded content and links from a Wordpress post (from 1 to all of them)


I am making a custom design which I will use with Wordpress (not a theme). I created a function to get one, more or all embedded items in a post. It works with <img>, <audio>, <video>, <iframe> and <a> tags, but it can be easily edited to get any html tag. I use it in a loop:

if (have_posts()) : while (have_posts()) : the_post();

I tested it breefly for all tags and different amounts of items, and it works properly with all but iframes. You can get all other tags. The function gets the src attribute and then recreates the whole element as needed, with additonal attributes if needed. I'll post the whole function below, it is a little long.

So I would like to get the iframe source. I tried many different preg_match ways to get it but it did not work. It's weird that it works with 'video' when it's either written first or in if part... but no iframes.

Since I am neither wordpress nor php developer, any other remarks - be it the security concerns or me doing something wrong - I would really appreciate to be told. I would also like to know if I use properly ob_start(); and does it help to make this function easier for the server, if there are many visitors at the same time... Also if there is a better way to make the arguments for the function...

I will add later a wrapper for each individual item, which is very useful (for example create a menu from posts links), and I hope that someone might find it useful, especially when those iframes get fixed.

This is the function:

// $universal_modifier for img size ('thumbnail') or link target '_blank'
// example: get_the_customized_post_content('link', 'all', 'link-class', '_blank');
// example: get_the_customized_post_content('image', 1, 'image-class', 'custom-thumbnail');

function get_the_customized_post_content($item_type = null, $items_num = null, $item_classes = null, $universal_modifier = null)
{
    // PHP automatically flushes open output buffers when it reaches the end of a script
    ob_start();
    global $post;
    $single_img = false;

    if ($items_num) {
        if ($item_type === 'image') {
            if ($items_num === 1 && $universal_modifier) {
                $single_img = true;
                // this will get the featured image, which allows for getting
                // the 'full' img with its sourceset
                the_post_thumbnail($universal_modifier, array(
                    'class' => esc_attr($item_classes),
                    'alt' => esc_html(get_the_title())
                ));
            } else {
                preg_match_all('/<img.+src=[\'"]([^\'"]+)[\'"].*>/i', $post->post_content, $item_src);
                $additional_attr = '';
                $the_html_tag = '<img';
                $close_tag = '';
            }
        } elseif ($item_type === 'audio') {
            preg_match_all('/<audio.+src=[\'"]([^\'"]+)[\'"].*>/i', $post->content, $item_src);
            $additional_attr = 'preload=none loading=lazy controls';
            $the_html_tag = '<audio';
            $close_tag = '</audio>';
        } elseif ($item_type === 'video') {
            preg_match_all('/<iframe.+src=[\'"]([^\'"]+)[\'"].*>/i', $post->post_content, $item_src);
            $additional_attr = 'loading=lazy frameborder=0 allowfullscreen';
            $the_html_tag = '<iframe';
            $close_tag = '</iframe>';
            if (count($item_src[1]) === 0) {
                echo 'Getting videos';
                preg_match_all('/<video.+src=[\'"]([^\'"]+)[\'"].*>/i', $post->post_content, $item_src);
                $additional_attr = 'preload=metadata loading=lazy controls';
                $the_html_tag = '<video';
                $close_tag = '</video>';
            }
        } elseif ($item_type === 'link') {
            preg_match_all('/<a.+href=[\'"]([^\'"]+)[\'"].*>/i', $post->post_content, $item_src);
            preg_match_all('/<a.+>([^\'"]+)<\/a>/i', $post->post_content, $anchor_text);
            if ($universal_modifier) {
                $additional_attr = 'target=' . $universal_modifier;
            } else {
                $additional_attr = '';
            }
            $the_html_tag = '<a';
            $close_tag = '</a>';
        } else {
            echo '<p class="' . esc_attr('not-found-info') . '">' . esc_html('Media not found...') . '</p>';
        }
    }


    if ($single_img) {
        $display_item = ob_get_clean();
        echo $display_item;
    } else {
        if (count($item_src[1]) === 0) {
            echo '<p class="' . esc_attr('not-found-info') . '">' . esc_html('Media not found...') . '</p>';
        } else {
            $num_of_items = count($item_src[0]);
            if ($items_num === 'all') {
                $items_total = $num_of_items;
            } else {
                $items_total = min($num_of_items, $items_num);
                // get the smaller number of the two
            }

            if ($items_total > 0) {
                for ($i = 0; $i < $items_total; $i++) {

                    if ($item_type === 'link') {
                        $source_type = 'href';
                        $item_content = $anchor_text;
                    } else {
                        $source_type = 'src';
                        $item_content = '';
                    }

                    $the_item = $the_html_tag . ' class="' . esc_attr($item_classes) . '" ' . $source_type . '="' . esc_url($item_src[1][$i]) . '" ' . esc_attr($additional_attr) . '>' . $item_content[1][$i] . $close_tag;
                    echo $the_item;
                };
            };
        };
    }
}

Solution

  • I had to change the function and I think it is better now. It works completely and is easy to use. The function is quite long…

    I used var_dump($post->post_content) and saw that there were no iframes in the post content. They get echoed only in the output on the page.

    Therefore I used get_media_embedded_in_content() with an appropriate media type to get the desired content, and then preg_match() to get what is needed.

    When required only one image per post, the function gets the posts thumbnail 'full' (the post has to have the featured image – there is a plugin for an automatic creation) which returns the image with the sourceset, which makes it responsive. For more images per post it returns the full image size.

    I added a wrapper for each individual item, if it's required. It's possible to add classes to it. It is useful for wrapping images in a <a> tag – it automatically makes the href. It’s also useful for wrapping links with <li> tags for making menus from the posts links (personally that was one of the most important things for me in this case). If set link target, it will be used for all links – either links as items or links as wrappers.

    Only images can be properly wrapped in <a> tag, others will be wrapped but will not have the href. If $item_type set to 'video' it will find both <video> and <iframe> tags. It will search for <iframe> first, because it's most likely that everyone will post videos from external sources than self hosted videos.

    <video> and <audio> will have controls. <video> will have preload="metadata", and <audio> preload="none". <iframe> will have frameborder="0" allowfullscreen. All except the links will have loading="lazy". I'm not sure yet if it will work for all of them with wordpress, but it makes no harm... and can be removed later if useless...

    As far as I can see, I escaped all html/attribute/url outputs.

    The function is called inside the loop, in the category page:

    if (have_posts()) : while (have_posts()) : the_post();
    
        get_the_customized_post_content(
            $item_type = 'image',
            $items_num = 'all',
            $item_classes = 'item-image-class class-two',
            $item_wrap = 'a',
            $item_wrap_classes = 'item-wrap-class one-more-class',
            $the_link_target = '_blank'
        );
    
        endwhile;
    wp_reset_postdata();
    endif;
    

    or:

    query_posts(array(
        'category_name' => 'your-cat-name',
        'posts_per_page' => 1
    ));
    
    if (have_posts()) : while (have_posts()) : the_post();
    
            get_the_customized_post_content(
                $item_type = 'audio',
                $items_num = 1,
                $item_classes = 'item-audio-class',
                $item_wrap = 'div',
                $item_wrap_classes = 'item-wrap-class classs-two',
                $link_target = false
            );
    
        endwhile;
        wp_reset_postdata();
        wp_reset_query();
    endif;
    

    $items_num can be 'all' or from 1 to as many as needed - if specified more items than the post has, it will return all the items the post has.

    Anything that is not needed, make it false (… $item_wrap = false, …). Sure there is a better way to write this, to write only those arguments one needs, and leaving out all that are false, but I do not know how to do it.

    I would still like to hear the opinions of the wordpress/php developers, especially on the function's performance, and on the coding – I’m sure it can be written better…

    Edit:

    • Improved preg_match_all for links creation.
    • Added multiple images alt tag.

    Also you have to remove:

    • Featured Image width/height inline attributes
    • Featured Image Wordpress default classes
    // remove Featured Image width/height inline attributes
    function remove_img_size_attr($html)
    {
        $html = preg_replace('/(width|height)="\d+"\s/', "", $html);
        return $html;
    }
    add_filter('post_thumbnail_html', 'remove_img_size_attr', 10);
    add_filter('image_send_to_editor', 'remove_img_size_attr', 10);
    
    // remove Featured Image classes
    remove_action('begin_fetch_post_thumbnail_html', '_wp_post_thumbnail_class_filter_add');
    

    all in functions.php

    OK, the function (place it in functions.php):

    function get_the_customized_post_content(
        $item_type = null,
        $items_num = null,
        $item_classes = null,
        $item_wrap = null,
        $item_wrap_classes = null,
        $link_target = null
    ) {
        // PHP automatically flushes open output buffers when it reaches the end of a script
        ob_start();
        global $post;
        $not_found_class = 'media-not-found'; // class for the p tag container for "Nothing found" info
    
        if ($item_type && $items_num) {
            if ($item_type === 'video') {
                $the_item_media = get_media_embedded_in_content(apply_filters('the_content', get_the_content(), -1), array('video', 'iframe'));
                $num_of_items = count($the_item_media);
            } elseif ($item_type === 'audio') {
                $the_item_media = get_media_embedded_in_content(apply_filters('the_content', get_the_content(), -1), array('audio'));
                $num_of_items = count($the_item_media);
            } elseif ($item_type === 'image') {
                if ($items_num === 1) {
                    // this will get the featured image, which allows for getting
                    // the 'full' img with its sourceset
                    // it's buffered, otherwise it echos automatically
                    the_post_thumbnail('full', array(
                        'class' => esc_attr($item_classes),
                        'alt' => esc_attr(get_the_title())
                    ));
                    $featured_img = ob_get_clean();
                    $the_item_url = get_the_post_thumbnail_url(get_the_ID(), 'full');
                    $num_of_items = 1;
                } else {
                    preg_match_all('/<img.+src=[\'"]([^\'"]+)[\'"].*>/i', $post->post_content, $item_src);
                    $the_item_media = $item_src[1];
                    $the_item_url = $item_src[1];
                    $the_title_attr = $item_src[1];
                    $num_of_items = count($the_item_media);
                };
            } elseif ($item_type === 'link') {
                preg_match_all('/<a.+href=[\'"]([^\'"]+)[\'"].*>([^\'"]+)<\/a>/i', $post->post_content, $link_parts);
                $num_of_items = count($link_parts[0]);
            } else {
                echo '<p class="' . esc_attr($not_found_class) . '">' . esc_html('Please enter the item type.') . '</p>';
            };
        } else {
            echo '<p class="' . esc_attr($not_found_class) . '">' . esc_html('Please enter the item type and the number of items to return...') . '</p>';
        };
    
        if ($num_of_items > 0) {
    
            if ($link_target) {
                $link_target_window = 'target=' . esc_attr($link_target);
            } else {
                $link_target_window = '';
            };
    
            if ($items_num === 'all') {
                $items_total = $num_of_items;
            } else {
                $items_total = min($num_of_items, $items_num);
                // get the smaller number of the two
            };
    
            if ($item_type === 'image') {
    
                if ($items_num === 1) {
                    if ($item_wrap) {
                        if ($item_wrap === 'a') {
                            echo '<a class="' . esc_attr($item_wrap_classes) . '" href="' . esc_url($the_item_url) . '" ' . $link_target_window . '>';
                        } else {
                            echo '<' . $item_wrap . ' class="' . esc_attr($item_wrap_classes) . '">';
                        }
                    };
    
                    echo $featured_img;
    
                    if ($item_wrap) {
                        echo '</' . $item_wrap . '>';
                    }
                } else {
                    for ($i = 0; $i < $items_total; $i++) {
                        if ($item_wrap) {
                            if ($item_wrap === 'a') {
                                echo '<a class="' . esc_attr($item_wrap_classes) . '" href="' . esc_url($the_item_url[$i]) . '" ' . $link_target_window . '>';
                            } else {
                                echo '<' . $item_wrap . ' class="' . esc_attr($item_wrap_classes) . '">';
                            }
                        };
    
                        $img_src = pathinfo($item_src[1][$i]);
                        $img_alt_tag = $img_src['filename'];
    
                        $fix_filename = array();
                        $fix_filename[0] = '/-/';
                        $fix_filename[1] = '/_/';
                        $fix_filename[2] = '/\s\s+/';
    
                        $img_alt_tag = ucwords(preg_replace($fix_filename, ' ', $img_alt_tag));
    
                        echo '<img class="' . esc_attr($item_classes) . '" alt="' . esc_attr($img_alt_tag) . '" src="' . esc_url($item_src[1][$i]) . '" loading=' . esc_attr('lazy') . '>';
    
                        if ($item_wrap) {
                            echo '</' . $item_wrap . '>';
                        }
                    };
                }
            } elseif ($item_type === 'link') {
                for ($i = 0; $i < $items_total; $i++) {
    
                    if ($item_wrap) {
                        echo '<' . $item_wrap . ' class="' . esc_attr($item_wrap_classes) . '">';
                    };
    
                    echo '<a class="' . esc_attr($item_classes) . '" href="' . esc_url($link_parts[1][$i]) . '" ' . esc_attr($link_target_window) . '>' . esc_html($link_parts[2][$i]) . '</a>';
    
                    if ($item_wrap) {
                        echo '</' . $item_wrap . '>';
                    };
                };
            } elseif ($item_type === 'video') {
                for ($i = 0; $i < $items_total; $i++) {
    
                    preg_match('/<iframe.+src=[\'"]([^\'"]+)[\'"].*>/i', $the_item_media[$i], $item_src);
    
                    if ($item_wrap) {
                        echo '<' . $item_wrap . ' class="' . esc_attr($item_wrap_classes) . '">';
                    };
    
                    if ($item_src[1]) {
                        echo '<iframe class="' . esc_attr($item_classes) . '" loading="lazy" src="' .
                            esc_url($item_src[1]) . '" frameborder="0" allowfullscreen></iframe>';
                    } else {
                        preg_match('/<video.+src=[\'"]([^\'"]+)[\'"].*>/i', $the_item_media[$i], $item_src);
    
                        if ($item_src[1]) {
                            echo '<video class="' . esc_attr($item_classes) . '" loading="lazy" src="' .
                                esc_url($item_src[1]) . '" preload="metadata" controls></video>';
                        } else {
                            echo '<p class="' . esc_attr($not_found_class) . '">' . esc_html('Media not found.') . '</p>';
                        };
                    };
    
                    if ($item_wrap) {
                        echo '</' . $item_wrap . '>';
                    };
                };
            } elseif ($item_type === 'audio') {
                for ($i = 0; $i < $items_total; $i++) {
    
                    preg_match('/<audio.+src=[\'"]([^\'"]+)[\'"].*>/i', $the_item_media[$i], $item_src);
    
                    if ($item_wrap) {
                        echo '<' . $item_wrap . ' class="' . esc_attr($item_wrap_classes) . '">';
                    };
    
                    if ($item_src[1]) {
                        echo '<audio class="' . esc_attr($item_classes) . '" loading="lazy" src="' .
                            esc_url($item_src[1]) . '" preload="none" controls></audio>';
                    } else {
                        echo '<p class="' . esc_attr($not_found_class) . '">' . esc_html('Media not found.') . '</p>';
                    };
    
                    if ($item_wrap) {
                        echo '</' . $item_wrap . '>';
                    };
                };
            } else {
                echo '<p class="' . esc_attr($not_found_class) . '">' . esc_html('Nothing found.') . '</p>';
            };
        } else {
            echo '<p class="' . esc_attr($not_found_class) . '">' . esc_html('Media not found.') . '</p>';
        };
    };