Search code examples
regexpowershell-3.0powershell-4.0

REGEX matching single line or multiline


Using a markdown file, I track my activity.
By the end of the week, I need to produce a report on how much time I spent on a subject.

What I'm trying to do is the following:

  1. From a MASTERFILE (see https://pastebin.com/1Qs8f00M), produce a daily and detailed report of my activity.
  2. From those daily reports (expected result would be https://pastebin.com/Pn56B3Fb), extract my time report.

MASTERFILE:

## %XXX ProjectName1
<br>

- XXX : Restabat ut Caesar post haec properaret accitus et ...
- XXX : fictisque blanditiis hortabatur...
```
$ various_commands or reminder
```

- XXX : Restabat ut Caesar post haec properaret accitus et abstergendae causa suspicionis sororem suam, eius uxorem, quid moliretur haerebat. : CHRG=0.5
```
Novo denique perniciosoque exemplo idem Gallus ausus est inire flagitium grave, quod Romae cum ultimo dedecore temptasse aliquando dicitur Gallienus, et adhibitis paucis clam ferro succinctis vesperi per tabernas palabatur et conpita quaeritando Graeco sermone, cuius erat inpendio gnarus, quid de Caesare quisque sentiret. 

Sed ut tum ad senem senex de senectute, sic hoc libro ad amicum amicissimus scripsi de amicitia. Tum est Cato locutus, quo erat nemo fere senior temporibus illis, nemo prudentior; nunc Laelius et sapiens (sic enim est     habitus) et amicitiae gloria excellens de amicitia loquetur. 
```
<br>

## %YYY ProjectName2
<br>

- YYY : Restabat ut Caesar post haec properaret accitus et : CHRG=0.25
<br>

The latter is easy, since my tasks are structured:

- [ProjectCode] : some details : CHRG=0,5

The following works quite good:

Get-Content -Raw .\test.md |
    Select-String '(-.*CHRG=.*)' -AllMatches |
    Foreach {$_.Matches} |
    Foreach {$_.Value}

The former is harder: I can't seem to grasp the right regex to

  • match lines such as ## %XXX ProjectName1 and
  • match block of lines starting with a line containing CHRG= and ending with lines containing <br>.

From Multiline regex to match config block, I tried the following with no success so far (I tried finding a beacon or marker useful since I use PANDOC on my .MD files to produce .HTML files ; two birds with one stone):

Get-Content -Raw .\test.md |
    Select-String '(?smi)(^## %.*|^-\s.*CHRG=.*).*?<br>' -AllMatches |
    Foreach {$_.Matches} |
    Foreach {$_.Value}

The desired output would be:

## %XXX ProjectName1
<br>

- XXX : Restabat ut Caesar post haec properaret accitus et abstergendae causa suspicionis sororem suam, eius uxore m, quid moliretur haerebat. : CHRG=0.5
<code>
Novo denique perniciosoque exemplo idem Gallus ausus est inire flagitium grave, quod Romae cum ultimo dedecore tem ptasse aliquando dicitur Gallienus, et adhibitis paucis clam ferro succinctis vesperi per tabernas palabatur et conpita quaeritando Graeco sermone, cuius erat inpendio gnarus, quid de Caesare quisque sentiret.

Sed ut tum ad senem senex de senectute, sic hoc libro ad amicum amicissimus scripsi de amicitia. Tum est Cato locutus, quo erat nemo fere senior temporibus illis, nemo prudentior; nunc Laelius et sapiens (sic enim est habitus) et amicitiae gloria excellens de amicitia loquetur.
</code>
<br>

## %YYY ProjectName2
<br>

- YYY : Restabat ut Caesar post haec properaret accitus et : CHRG=0.25
<br>

The actual output is:

## %XXX ProjectName1
<br>

- XXX : Restabat ut Caesar post haec properaret accitus et ...
- XXX : fictisque blanditiis hortabatur...
<code>
$ various_commands or reminder
</code>

- XXX : Restabat ut Caesar post haec properaret accitus et abstergendae causa suspicionis sororem suam, eius uxorem, quid moliretur haerebat. : CHRG=0.5
<code>
Novo denique perniciosoque exemplo idem Gallus ausus est inire flagitium grave, quod Romae cum ultimo dedecore temptasse aliquando dicitur Gallienus, et adhibitis paucis clam ferro succinctis vesperi per tabernas palabatur et conpita quaeritando Graeco sermone, cuius erat inpendio gnarus, quid de Caesare quisque sentiret.

Sed ut tum ad senem senex de senectute, sic hoc libro ad amicum amicissimus scripsi de amicitia. Tum est Cato locutus, quo erat nemo fere senior temporibus illis, nemo prudentior; nunc Laelius et sapiens (sic enim est habitus) et amicitiae gloria excellens de amicitia loquetur.
</code>
<br>

## %YYY ProjectName2
<br>

- YYY : Restabat ut Caesar post haec properaret accitus et : CHRG=0.25
<br>

Solution

  • These parts of your regular expression

    (^## %.*|^-\s.*CHRG=.*).*?<br>
     ~~~~~~~               ~~~~~~~

    match everything from the first ## % up to the last <br> because of the greedy .* in the alternation and the single-line modifier ((?s)). The latter makes the dot match newlines as well, so ^## %.* will match ## % at the beginning of a line and everything thereafter. ^-\s.*CHRG= will match a hyphen and space (well, any whitespace character actually) at the beginning of a line up to the next occurrence of CHRG=, even if there are other lines beginning with a hyphen and space in between.

    Try something like this:

    (?mi)^(## %.*|-\s.*CHRG=.*)[\s\S]*?<br>
    

    Removing the single-line modifier from the expression makes the alternation match only within a line (because . won't match newlines). The [\s\S]*? then does a non-greedy match of everything from the end of the line up to the next <br> (including newlines).