Search code examples
regexrustregex-lookarounds

What's the most sensible way to emulate lookaround behavior in Rust regex?


The Rust regex crate states:

This crate provides a native implementation of regular expressions that is heavily based on RE2 both in syntax and in implementation. Notably, backreferences and arbitrary lookahead/lookbehind assertions are not provided.

As of this writing, "rust regex lookbehind" comes back with no results from DuckDuckGo.

I've never had to work around this before, but I can think of two approaches:

Approach 1 (forward)

  1. Iterate over .captures() for the pattern I want to use as lookbehind.
  2. Match the thing I actually wanted to match between captures. (forward)

Approach 2 (reverse)

  1. Match the pattern I really want to match.
  2. For each match, look for the lookbehind pattern until the end byte of a previous capture or the beginning of the string.

Not only does this seem like a huge pain, it also seems like a lot of edge cases are going to trip me up. Is there a better way to go about this?

Example

Given a string like:

"Fish33-Tiger2Hyena4-"

I want to extract ["33-", "2", "4-"] iff each one follows a string like "Fish".


Solution

  • If you have a regex application where you have a known consistent pattern that you want to use as lookbehind, another workaround is to use .splits() with the lookbehind-matching pattern as the argument (similar to the idea mentioned in the other answer). That will at least give you strings expressed by their adjacency to the match you want to lookbehind.

    I don't know about performance guarantees regex-wise but this at least means that you can do a lookbehind-free regex match on the split result either N times (for N splits), or once on the concatenated result as needed.