Search code examples
phpregexarrayspreg-replacepreg-replace-callback

How to save regex backreferences to an array during preg_replace or preg_replace_callback


Here's the problem: I have a database full of articles marked up in XHTML. Our application uses Prince XML to generate PDFs. An artifact of that is that footnotes are marked up inline, using the following pattern:

<p>Some paragraph text<span class="fnt">This is the text of a footnote</span>.</p>

Prince replaces every span.fnt with a numeric footnote marker, and renders the enclosed text as a footnote at the bottom of the page.

We want to render the same content in ebook formats, and XHTML is a great starting point, but the inline footnotes are terrible. What I want to do is convert the footnotes to endnotes in my ebook build script.

This is what I'm thinking:

  1. Create an empty array called $endnotes to store the endnote text.
  2. Set a variable $endnote_no to zero. This variable will hold the current endnote number, to display inline as an endnote marker, and to be used in linking the endnote marker to the particular endnote.
  3. Use preg_replace or preg_replace_callback to find every instance of <span class="fnt">(.*?)</span>.
  4. Increment $endnote_no for each instance, and replace the inline span with '<sup><a href="#endnote_' . $endnote_no . '">' .$endnote_no . ''`
  5. Push the footnote text to the $endnotes array so that I can use it at the end of the document.
  6. After replacing all the footnotes with numeric endnote references, iterate through the $endnotes array to spit out the endnotes as an ordered list in XHTML.

This process is a bit beyond my PHP comprehension, and I get lost when I try to translate this into code. Here's what I have so far, which I mainly cobbled together based on code examples I found in the PHP documentation:

$endnotes = array();
$endnote_no = 0;
class Endnoter {

  public function replace($subject) {
    $this->endnote_no = 0;
    return preg_replace_callback('`<span class="fnt">(.*?)</span>`', array($this, '_callback'), $subject);
  }

  public function _callback($matches) {
    array_push($endnotes, $1);
    return '<sup><a href="#endnote_' . $this->endnote_no++ . '">' . $this->endnote_no . '</a></sup>';
  }
}

...

$replacer = new Endnoter();
$replacer->replace($body);
echo '<pre>';
print_r($endnotes); // Just checking to see if the $endnotes are there.
echo '</pre>';

Any guidance would be helpful, especially if there is a simpler way to get there.


Solution

  • Don't know about a simpler way, but you were halfway there. This seems to work.

    I just cleaned it up a bit, moved the variables inside your class and added an output method to get the footnote list.

    class Endnoter
    {
        private $number_of_notes = 0;
        private $footnote_texts = array();
    
        public function replace($input) {
    
            return preg_replace_callback('#<span class="fnt">(.*)</span>#i', array($this, 'replace_callback'), $input);
    
        }
    
        protected function replace_callback($matches) {
    
            // the text sits in the matches array
            // see http://php.net/manual/en/function.preg-replace-callback.php
            $this->footnote_texts[] = $matches[1];
    
            return '<sup><a href="#endnote_'.(++$this->number_of_notes).'">'.$this->number_of_notes.'</a></sup>';
    
        }
    
        public function getEndnotes() {
            $out = array();
            $out[] = '<ol>';
    
            foreach($this->footnote_texts as $text) {
                $out[] = '<li>'.$text.'</li>';
            }
    
            $out[] = '</ol>';
    
            return implode("\n", $out);
        }
    
     }