Search code examples
phpregexversiontext-extractionreadme

Extract version-specific upgrade notice from readme text


I am currently writing a PHP function which should help me to extract an upgrade notice from a given readme text.

This is my source text:

Some stuff before this notice like a changelog with versioning and explanation text.

== Upgrade Notice ==

= 1.3.0 =

When using Master Pro, 1.3.0 is the new minimal required version!

= 1.1.0 =

When using Master Pro, 1.1.0 is the new minimal required version!

= 1.0.0 =

No upgrade - just install :)

[See changelog for all versions](https://plugins.svn.wordpress.org/master-pro/trunk/CHANGELOG.md).

This is the function:

/**
 * Parse update notice from readme file
 *
 * @param string $content
 * @param string $new_version
 *
 * @return void
 */
private function parse_update_notice( string $content, string $new_version ) {
    $regexp  = '~==\s*Upgrade Notice\s*==\s*(.*?=+\s*' . preg_quote( $new_version ) . '\s*=+\s*(.*?)(?=^=+\s*\d+\.\d+\.\d+\s*=+|$))~ms';

    if ( preg_match( $regexp, $content, $matches ) ) {
        $version = trim( $matches[1] );
        $notices = (array) preg_split( '~[\r\n]+~', trim( $matches[2] ) );

        error_log( $version );
        error_log( print_r( $notices, true ) );
    }
}

I am currently stuck at my RegEx. I'm not really getting it to work. This was my initial idea:

  1. Only search after == Upgrade Notice ==
  2. Check if we have a version matching $new_version
  3. Get the matched version between the = x.x.x = as match 1 e.g. 1.1.0
  4. Get the content after the version as match 2 but stopping after an empty new line. The upgrade notice can go over multiple lines but without an empty new line.

Solution

  • To get the first part after "Upgrade Notice", matching only the first following block with non empty lines, you can omit the s flag to have the dot match a newline and capture matching all following lines that contain at least a single non whitespace character.

    ^==\h*Upgrade Notice\h*==\R\s*^=\h*(1\.3\.0)\h*=\R\s*^((?:\h*\S.*(?:\R\h*\S.*)*)+)
    

    The line in PHP:

    $regexp = '~^==\h*Upgrade Notice\h*==\R\s*^=\h*(' . preg_quote( $new_version ) . ')\h*=\R\s*^((?:\h*\S.*(?:\R\h*\S.*)*)+)~m';
    

    Regex demo


    If you want to be able to determine which occurrence after matching "Upgrade Notice", you can use a quantifier to skip the amount of occurrences that start with the version pattern:

    ^==\h*Upgrade Notice\h*==(?:(?:\R(?!=\h*\d+\.\d+\.\d+\h*=$).*)*\R=\h*(\d+\.\d+\.\d+)\h*=$\s*){2}(^\h*\S.*(?:\R\h*\S.*)+)
    
    • ^ Start of string
    • ==\h*Upgrade Notice\h*== The starting pattern, where \h* match optional horizontal whitespace characters
    • (?: Non capture group
      • (?:\R(?!=\h*\d+\.\d+\.\d+\h*=$).*)* Match all lines that do not start with a version pattern
      • \R=\h* Match a newline and = followed by horizontal whitespace characters
      • (\d+\.\d+\.\d+) Capture group 1, match the version
      • \h*=$\s* Match horizontal whitespace characters, = and assert the end of the string and match optional whitespace characters
    • ){2} Use a quantifier (in this case {2}) to match n times a version pattern
    • ^ Start of string
    • ( Capture group 2
      • (?:\h*\S.*(?:\R\h*\S.*)*)+ Match 1 or more lines that contain at least a single non whitespace character
    • ) Close the group

    Regex demo