I don't know how to transform / the equivalent of this negative lookahead search on Neovim.
&(?!(?:apos|quot|[gl]t|amp);|#)
When I try silver search, it is working. I want to search but only on the single file using /
Your question is interesting and I effectively have often troubles with the syntax of regular expressions in Vim or other tools that don't use the PCRE or common syntaxes!
Googling a bit, I found this article about lookarounds in Vim.
As you can see, it's just a matter of syntax, again!
A negative lookahead such as (?!amp;)
should be written \(amp;\)\@!
.
This leads to something like this:
&\(\(\w\{2,6\}\|#\d\{1,6\}\|#[xX][0-9a-fA-F]\{1,6\}\);\)\@!
I match &
, '
with &\w{2,6};
in PCRE, which becomes
&\w\{2,6\};
in Vim's syntax.
Tested that on this XML:
<note>
<author>Jani</author>
<heading>Reminder</heading>
<summary>Glasses & sunscreen</summary>
<body>
Don't forget to pack your glasses & sunscreen to go to
the beach tomorrow!
If you forget your glasses 😎 -> you'll damage your retina.
If you don't put some sunscreen on -> you'll probably get sun burnt 🌞.
Now, the problem is to match the ampersand in "H&M" 😬!
< = <
> = >
& = &
= non-breaking space
¡ = ¡
¢ = ¢
£ = £
¤ = ¤
¥ = ¥
¦ = ¦
§ = §
¨ = ¨
© = ©
ª = ª
« = «
¬ = ¬
­ =
® = ®
¯ = ¯
° = °
± = ±
² = ²
³ = ³
´ = ´
µ = µ
¶ = ¶
¸ = ¸
¹ = ¹
º = º
» = »
¼ = ¼
½ = ½
¾ = ¾
¿ = ¿
× = ×
÷ = ÷
</body>
</note>
In Vim, you have to escape parenthesis, braces and pipes but not square brackets! This is clearly not very readable. Perhaps there are some extensions to make it easier to use. Just Googled a bit and found Perl compatible regular expressions in Vim.
I've started writing myself a note about the flavours of regular expression engines. It might be useful for others:
PCRE | sed and vim | Description |
---|---|---|
. |
. |
match any char |
* |
* |
0 or n times |
+ |
\+ |
1 or more times |
? |
\? |
0 or 1 time |
^ |
^ |
begin of pattern |
$ |
$ |
end of pattern |
{3} |
\{3\} |
3 times |
{3,} |
\{3,\} |
3 or more times |
(regexp) |
\(regexp\) |
Group matching "regexp" |
[abc] |
[abc] |
"a", "b", or "c" |
[^abc] |
[^abc] |
any char not "a", "b" or "c" |
\2 |
\2 |
back reference of group n°2 |
(?: ) |
non-capturing group | |
(?=this-after) |
\(this-after\)\@= Vim✔️, sed❌ |
Positive lookahead |
(?!not-this-after) |
\(not-this-after\)@! Vim✔️, sed❌ |
Negative lookahead |
(?<=this-before) |
\(this-before\)@<= Vim✔️, sed❌ |
Positive lookbehind |
(?<!not-this-before) |
\(not-this-before\)@<! Vim✔️, sed❌ |
Negative lookbehind |
It seems that sed doesn't handle lookarounds, but the syntax is very similar to Vim for most of the other cases.
Thanks to Friedrich's comment, Vim has helpful patterns to define
the start and end of a match: \zs
and \ze
.
You can place \zs
anywhere in the search, Vim will only match
after the start. You can use both to say "find this specific pattern
and only replace a part of it". Example with this text:
James Bond and James Cameron are well known, but not James Tartempion.
If you only want to uppercase "James" followed by " Bond" or " Cameron":
:s/James\ze \(Bond\|Cameron\)/JAMES/gi
But if you need negative lookarounds, then it might be more complicated to write the pattern this way, as you'll probably have to use negative character classes. In this case, I would use the negative lookarounds to make the pattern more readable. Typically, to uppercase all "James" which aren't followed by " Tartempion":
:s/James\( Tartempion\)\@!/JAMES/gi
If Vim is installed with the Perl extension (my case out of the box in Cygwin and Ubuntu), then you can simply use PCRE regular expressions in Vim, typically for your problem of ampersands that need to be converted to HTML entities:
:perldo s/&(?!(?:#\d{2,6}|#x[0-9a-fA-F]{2,6}|\w{2,6});)/&/g
And for the "James" not followed by " Tartempion" example:
:perldo s/James(?! +Tartempion)/JAMES/gi