Search code examples
phpregexfilefile-handling

i have a text file and want to extract spacific paragraph b/w two special line in php


I have this piece of code and want to extract data from the text file between two specific lines. I want to extract each section b/w those two lines. TEXT file example is here

---
 - ID: some random id

 \_______________________________\_
HELLO 
This is an example text.
I AM SECTION 1
\_______________________________\_
HELLO 
This is an example text.
I AM SECTION 2
\_______________________________\_
HELLO 
This is an example text.
I AM SECTION 3
\_______________________________\_
hello 
this is example text here
and i am section 4

here I have some code where I matched these lines but didn't find how to extract each section included the last section from a text file.

And need Output like this:

[0] => ' HELLO 
         This is an example text.
         I AM SECTION 1',
[1] => ' HELLO 
         This is an example text.
         I AM SECTION 2',
[2] => ' HELLO 
         This is an example text.
         I AM SECTION 3',
[3] => ' HELLO 
         This is an example text.
         I AM SECTION 4',


public static function find_section_in_file($file = '', $directory = '')
{
    $response = ['error' => true, 'section' => NULL];
    if (isset($file) && isset($directory)) {
        $handle = fopen($directory."\\".$file, "r");
        $section = [];
        if ($handle) {

            while (($line = fgets($handle)) !== false) {
                $new_line = trim(preg_replace('/\s+/', ' ', $line));
                $start = self::startsWith($new_line, '\__');
                $end = self::endsWith($new_line, '_\_');

                if ($start && $end){
                    array_push($section, $line);
                }
            }
            fclose($handle);
            $response = ['error' => false, 'section' => $section];

        }
        //need To write Query to save section in DB
    }
    return $response;
}

Solution

  • You could match all lines that do not start with the backslash/underscores line and capture those in capture group 1.

    ^\h*\\_+\\_\R((?:.*\R(?!\h*\\_+\\_).*)*)
    

    Explanation

    • ^ Start of string
    • \h*\\_+\\_\R Match 0+ horizontal whitespace chars, \ , 1+ underscores, \, _ and a unicode newline sequence
    • ( Capture group 1
      • (?: Non capture group
        • .*\R Match the whole line and a newline
        • (?!\h*\\_+\\_) Negative lookahead, assert that the line does not start with the backslash/underscores
        • .* Match Match the whole line
      • )* Close non capture group and repeat 1+ times
    • ) Close capture group

    Regex demo | php demo

    For example

    $re = '/^\h*\\\\_+\\\\_\R((?:.*\R(?!\h*\\\\_+\\\\_).*)*)/m';
    $str = '---
     - ID: some random id
    
     \\_______________________________\\_
    HELLO 
    This is an example text.
    I AM SECTION 1
    \\_______________________________\\_
    HELLO 
    This is an example text.
    I AM SECTION 2
    \\_______________________________\\_
    HELLO 
    This is an example text.
    I AM SECTION 3
    \\_______________________________\\_
    hello 
    this is example text here
    and i am section 4';
    
    preg_match_all($re, $str, $matches);
    print_r($matches[1]);
    

    Output

    Array
    (
        [0] => HELLO 
    This is an example text.
    I AM SECTION 1
        [1] => HELLO 
    This is an example text.
    I AM SECTION 2
        [2] => HELLO 
    This is an example text.
    I AM SECTION 3
        [3] => hello 
    this is example text here
    and i am section 4
    )