Search code examples
phphtmlwordpressbatch-processing

Loop through html files, get file name and insert into each file


I'm migrating a website to Wordpress...the old website uses a custom-built posting system where a PHP template makes a call to separate static HTML files for each post. There are quite a number of posts to migrate (over 1000).

I'm using a plugin that can import the HTML files and turn each one into a Wordpress post but it's important that the original date for each post is set correctly. Conveniently, the plugin allows me to select the date for each post from an HTML tag in each file.

My problem is that the dates are all in the file names, not the files themselves. The files are all named by yy-mm-dd, but with no dashes, so they look like:
"130726.htm" (for July 26th, 2013)
"121025.htm" (for October 25th, 2012)

So basically I need a way to loop through a directory of these files and for each one - get the file name, add slashes and then insert it with into the file after <body> in a tag like:
<p class="origDate">13/07/26</p>

I'm not sure the best way to go about this...a Python script, a Notepad++ macro, batch file or whatever else. Can anyone offer any help/tips/suggestions? They'd be greatly appreciated!

Thanks in advance!


Solution

  • I made a mistake with understanding the question and the first script.

    This script scans through files in the dates directory(i have assumed here that the dates directory contains just html files in your required format), and then opens the files and inserts the paragraph below the body.

    Sample contents of dates folder:

    121214.html 121298.html 121299.html

    PHP Script (script placed in same directory as dates folder):

    <?php
    $dir = "dates";
    $a = scandir($dir);
    
    $a = array_diff($a, array(".", ".."));
    
    
    
    foreach ($a as $value)
    {
    
    
       $string = file_get_contents("dates/".$value);
    
    
    
    
    
       $newstring = substr($value,0,-5);
       $newstring1 = substr($newstring,0,2);
       $newstring2 = substr($newstring,2,2);
       $newstring3 = substr($newstring,4,2);
       $para =  '<p class="origDate">' .$newstring1 . "/" . $newstring2 . "/" . $newstring3 . '</p>' . "<br>";
       $pattern = '/<body[\w\s="-:;]*>/';
       $replacement = '${0}'.$para;
       $newpara = preg_replace($pattern, $replacement, $string);
    
    
    
       $filename ="dates/".$value;
       $file = fopen($filename, "r+");
    
       fwrite($file, $newpara);
       fclose($file);
    
    }
    ?>
    

    I have used .html here, to use .htm, modify this line:

    $newstring = substr($value,0,-5);
    

    to

    $newstring = substr($value,0,-4);
    

    Sample HTML Before:

    <!DOCTYPE html>
    <html>
    
    <body marginwidth=0 style="margin-left: 30px;" onclick="myfunction()">
    
    <ul><li>Coffee</li><li>Tea</li></ul>
    
    </body>
    </html>
    

    Sample HTML after:

    <!DOCTYPE html>
    <html>
    <body marginwidth=0 style="margin-left: 30px;" onclick="myfunction()"><p class="origDate">12/12/14</p><br>
    
    <ul><li>Coffee</li><li>Tea</li></ul>
    
    
    
    </body>
    </html>