Suppose I have the following markdown
list items:
- [x] Example of a completed task.
- [x] ! Example of a completed task.
- [x] ? Example of a completed task.
I am interested to parse that item using regex
and extract the following group captures:
$1
: the left [
and the right ]
brackets when the symbol x
is in-between$2
: the symbol x
in between the brackets [
and ]
$3
: the modifier !
that follows after [x]
$4
: the modifier ?
that follows after [x]
$5
: the text that follows [x]
without a modifier, e.g., [x] This is targeted.
$6
: the text that follows [x] !
$7
: the text that follows [x] ?
After a lot of trial-and-error using online parsers, I came up with the following:
((?<=x)\]|\[(?=x]))|((?<=\[)x(?=\]))|((?<=\[x\]\s)!(?=\s))|((?<=\[x\]\s)\?(?=\s))|((?<=\[x\]\s)[^!?].*)|((?<=\[x\]\s!\s).*)|((?<=\[x\]\s\?\s).*)
To make the regex
above more readable, these are the capture groups listed one by one:
$1
: ((?<=x)\]|\[(?=x]))
$2
: ((?<=\[)x(?=\]))
$3
: ((?<=\[x\]\s)!(?=\s))
$4
: ((?<=\[x\]\s)\?(?=\s))
$5
: ((?<=\[x\]\s)[^!?].*)
$6
: ((?<=\[x\]\s!\s).*)
$7
: ((?<=\[x\]\s\?\s).*)
This is most likely not the best way to do it, but at least it seems to capture what I want:
I would like to extend that regex
to capture lines in a markdown
table that
looks like this:
| | Task name | Plan | Actual | File |
| :---- | :-------------------------------------- | :---------: | :---------: | :------------: |
| [x] | Task one with a reasonably long name. | 08:00-08:45 | 08:00-09:00 | [[task-one]] |
| [x] ! | Task two with a reasonably long name. | 09:00-09:30 | | [[task-two]] |
| [x] ? | Task three with a reasonably long name. | 11:00-13:00 | | [[task-three]] |
More specifically, I am interested in having the same group captures as above, but I would like to exclude the table grid (i.e., the |
). So, groups $1
to $4
should stay the same, but groups $5
to $7
should capture the text, excluding the |
, e.g., like in the selection below:
Do you have any ideas on how I can adjust, for example, the regex for group $5
to exclude the |
. I have endlessly tried all sorts of negations (e.g., [^\|]
). I am using Oniguruma regular expressions.
You can use
((?<=x)]|\[(?=x]))|((?<=\[)x(?=]))|((?<=\[x]\s)!(?=\s))|(?<=\[x]\s)(\?)(?=\s)|(?:\G(?!\A)\||(?<=\[x]\s[?!\s]\s\|))\K([^|\n]*)(?=\|)
See the regex101 PCRE and a Ruby (Onigmo/Oniguruma) demos.
What is added? The (?:\G(?!\A)\||(?<=\[x]\s[?!\s]\s\|))\K([^|\n]*)(?=\|)
part:
(?:
- start of a non-capturing group (a custom boundary here, we'll match...)
\G(?!\A)\|
- either the end of the previous match and a |
char (i.e. |
must immediately follow the previous match),|(?<=\[x]\s[?!\s]\s\|)
- or a location that is immediately preceded with [x]
+ a whitespace + a ?
, !
or whitespace + a whitespace and |
char)
- end of the group\K
- match reset operator that removes the text matched so far from the overall match memory buffer([^|\n]*)
- zero or more chars other than |
and a line feed char(?=\|)
- a |
char must appear immediately to the right of the current location.