How can I remove the enclosing tags around a piece of HTML?

I am creating a custom filter for text using the asciidoc syntax for Drupal using the customfilter module. I enclose it in [asciidoc][/asciidoc] tags and when I run it through the asciidoctor command the output is enclosed in <div class="paragraph"><p> tags.

Output like this in which I am using the [asciidoc] tag to format html links comes out like this.

On the markup side Drupal's contrib `markdown` filter has been somewhat iffy,
and so has the `bbcode` filter. Looking around for other more compact documenting
systems led me to the https://asciidoc.org[Asciidoc] utility and its more
advanced brother https://asciidoctor.org[Asciidoctor]. In combination with another
 Drupal module called https://drupal.org/project/customfilter[customfilter] which
makes it easy to create your own filters, I think I have hit on a combination
of modules which allow me as much freedom and fine control on my pages as I want.

<div class="paragraph">
<p>On the markup side Drupal&#8217;s contrib <code>markdown</code> filter has been somewhat iffy,
and so has the <code>bbcode</code> filter. Looking around for other more compact documenting
systems led me to the <a href="https://asciidoc.org">Asciidoc</a> utility and its more
advanced brother <a href="https://asciidoctor.org">Asciidoctor</a>. In combination with another
 Drupal module called <a href="https://drupal.org/project/customfilter">customfilter</a> which
makes it easy to create your own filters, I think I have hit on a combination
of modules which allow me as much freedom and fine control on my pages as I want.</p>
</div>

Is there some PHP function that can take a string HTML and the set of enclosing tags to string, and return the inner HTML they enclose? Or perhaps some regex expression which can match the portion between the tags?

This is the desired output

On the markup side Drupal&#8217;s contrib <code>markdown</code> filter has been somewhat iffy,
and so has the <code>bbcode</code> filter. Looking around for other more compact documenting
systems led me to the <a href="https://asciidoc.org">Asciidoc</a> utility and its more
advanced brother <a href="https://asciidoctor.org">Asciidoctor</a>. In combination with another
 Drupal module called <a href="https://drupal.org/project/customfilter">customfilter</a> which
makes it easy to create your own filters, I think I have hit on a combination
of modules which allow me as much freedom and fine control on my pages as I want.

I asked a related question if asciidoc could be configured to avoid enclosing the output in <div class="paragraph"><p>...</p></div> - Does asciidoctor have a setting to remove the <paragraph> and <p> tags from the source it outputs?

Solution

Through pure PHP you may use DOMDocument which i don't recommend using cause it is slow and you will get in trouble tracing its errors and so. For the same reason I am not gonna explain more about that object. Just a link from the official website:

PHP DomDocument

Note: I personally prefer using DomDocument when you work with large texts for example i used to read the whole page and get all the elements one by one which was nearly impossible with regex. In that case i used DomDocument.

Lets get back to your topic. Your example shows that you are not parsing large chunks so i recommend using Regex.

preg_match_all( '/<p>(?P<content>.*?)<\/p>/s' ,$text, $ref );
var_dump($ref['content']);

The above regex gives you all the elements beetwen p tag.

You may play with it and make a new one like this:

preg_match_all( '/<div class="paragraph">\s<p>(?P<content>.*?)<\/*p>\s<\/*div>/' ,$text, $ref );

which gives you everything between div tags( the tags may have any attribute ).

Also see the link below on regex

Regex Tutorial

Good Luck