Search code examples
regexvim

How do you search/replace the nth occurrence in vim visual mode?


This works:

'<,'>s/\v\/\zs(\/)// 
'<,'>s/\v(\/)@<=\//BAR/

I was just wondering if there was an easier way to replace the nth occurrence with a {} or something in vim.

Replace the third forward slash '/'.

/dir1//fas//fooBar/¬
/dir2//\.foobar//fas/¬
/dir//.foo//fas/¬

How would I replace the fourth 'foo' ?

foo foo foo foo foo foo foo
foo foo foo foo foo foo foo

Solution

  • I'll discuss matching these patterns for simplicity, replacing them in a substitute command should work the same, using the same pattern on the :s command.

    Replace the third forward slash '/'.

    With a one-character match this is easier, since you can use [^/] to find characters that are not part of the match.

    If you want to count matches, you need to start from the beginning of the line, so anchor with ^.

    At that point, you can match two instances of "not slashes" followed by a "slash", and then on the third one you can use a \zs to mark it as the start of the actual match.

    It's a bit unfortunate that / itself will need to be escaped with \/ if we use it on a match, but the resulting pattern is:

    /\v^%([^\/]*\/){2}[^\/]*\zs\/
    

    One common tip for patterns that include / is to search backwards using ? instead, so let's do that to improve readability:

    ?\v^%([^/]*/){2}[^/]*\zs/
    

    The patterns pattern items I used here that might be unfamiliar to some are:

    • %(...): Groups a pattern, same as (...) but doesn't create a capture group.
    • {2}: Matches the preceding pattern exactly twice.

    Remember we're using "verymagic" with \v, so most of the above won't require backslashes.

    There's a neat shortcut we can take to shorten the pattern above (and that will help us when we look at the case of the longer word), which is that if you have \zs in multiple places in your pattern, then the last one to match will be the one that will define the actual start of the match. (See :help /\zs.)

    So we can simplify that to:

    ?\v^%([^/]*\zs/){3}
    

    We match "not slashes" followed by a "slash" three times. The \zs will only take effect on the last (third) match, so you'll end up matching the third slash on the line.

    Now let's move on to the more complicated case of matching a word:

    How would I replace the fourth 'foo' ?

    Here we can't use [^...] to match "not foo". I mean, we could use something like \v([^f]|f[^o]|fo[^o]) but that grows quickly as the word you're matching grows. And there's a better way to do it.

    We can use a zero-width negative look-behind! See :help /\@<! for this interesting operator. In short, it takes the preceding atom (we'll use a group with the word here) and makes sure that that item does not match ending at that location.

    So we can use this:

    /\v^%(%(.%(foo)@<!)*\zsfoo){4}
    

    The %(foo)@<! here ensures that each . we match will not be the last o in foo. That way we can accurately count the first, second, third and fourth foo on the line and make sure we won't match the fifth, sixth or seventh.

    Here again we're using the trick of repeating it four times (to find the fourth match) and having the last \zs stick.

    Note that the negative look-behind works well with a fixed word, but if you start having multis such as * or + etc. then things get a lot more complicated. Take a look at the help for the operator and the warnings that it can be slow. There are also a variant of the operator that limits how many characters back it will look, which you don't strictly need when matching a fixed word, but may be helpful on a more general match.

    One interesting test case for this one is a match that has repetitions, such as fofo, and a text that includes repetitions of those, such as fofofo or fofofofo.

    In fact, testing on those made me see that the pattern above will actually prefer to match the second occurrence in fofofo rather than the first one, if that's the fourth occurrence of fofo in that line. That's because the * operator is greedy. We can fix that by using {-} instead, which matches the shortest sequence possible.

    Fixing that bug, we get:

    /\v^%(%(.%(foo)@<!){-}\zsfoo){4}
    

    Which is general enough and you can probably use with any fixed word, or even a pattern with a few variations (e.g. case, plurals, alternative spellings, etc.)