Search code examples
phpregexsplit

Split by new lines which are followed by a number or a letter then a dot


I have a string that has a content like this.

19. Which of the following conflicting criteria does the problem below satisfe. 2.1
C++ pointers are powerful and very flexible but at the cost of poor intelligibility.
a.  Writability vs Readability
b.  Reliability vs Cost of execution
c.  Writability vs Reliability
d.  Cost of execution vs. Readability
e.  Cost of execution vs. Readability

What I want to do is to split it like this.

    [0] => 19.  Which of the following conflicting criteria does the problem below satisfye. 2.1
C++ pointers are powerful and very flexible but at the cost of poor intelligibility.

    [1] => a.   Writability vs Readability

    [2] => b.   Reliability vs Cost of execution

    [3] => c.   Writability vs Reliability

    [4] => d.   Cost of execution vs. Readability

    [5] => e.   Cost of execution vs. Readability

My regex is weak and I am having this kind of result.

preg_split('/(?=[a-e\d]+\.(?!\d))/', $entries, -1, PREG_SPLIT_NO_EMPTY);

    [0] => 1
    [1] => 9.   Which of the following conflicting criteria does the problem below satisfy
    [2] => e. 2.1
C++ pointers are powerful and very flexible but at the cost of poor intelligibility.

    [3] => a.   Writability vs Readability

    [4] => b.   Reliability vs Cost of execution

    [5] => c.   Writability vs Reliability

    [6] => d.   Cost of execution vs. Readability

    [7] => e.   Cost of execution vs. Readability

How should I do this?


Solution

  • As I understand, you want to split at one or more \v vertical spaces, if there's ^[a-e\d]+\. ahead (starting the following line). preg_split function is fine:

    $pattern = '/\v+(?=^[a-e\d]+\.)/m';
    

    m is the multiline flag for making the ^ caret match line start (not only string start).

    print_r(preg_split($pattern, $str));
    

    test at eval.in; should give the desired result:

    Array
    (
        [0] => 19. Which of the following conflicting criteria does the problem below satisfe. 2.1
    C++ pointers are powerful and very flexible but at the cost of poor intelligibility.
        [1] => a.  Writability vs Readability
        [2] => b.  Reliability vs Cost of execution
        [3] => c.  Writability vs Reliability
        [4] => d.  Cost of execution vs. Readability
        [5] => e.  Cost of execution vs. Readability
    )
    

    Also see regex101 for testing the split sequence. If there's empty lines with spaces in between, try \s+ (one or more of any whitespace) instead of \v+.