Search code examples
regexkotlinlookbehind

positive lookbehind in kotlin doesn't work in match


I'm iterating on this file:

[INFO] com.demo:communication:jar:3.5.0-SNAPSHOT
[INFO] +- com.cellwize.optserver:optserver-admin:jar:3.5.0-SNAPSHOT:compile
[INFO] |  +- org.apache.logging.log4j:log4j-api:jar:2.7:compile
[INFO] |  +- org.apache.logging.log4j:log4j-core:jar:2.7:compile
[INFO] |  |  \- (org.apache.logging.log4j:log4j-api:jar:2.7:compile - omitted for duplicate)
[INFO] |  +- org.apache.logging.log4j:log4j-slf4j-impl:jar:2.7:compile
[INFO] |  |  +- org.slf4j:slf4j-api:jar:1.7.21:compile
[INFO] |  |  \- (org.apache.logging.log4j:log4j-api:jar:2.7:compile - omitted for duplicate)

I want to remove all the prefix on every line: "[INFO] " / "[INFO] +- " / "[INFO] | | - " etc

I'm using this function I wrote on every line in the file:

private fun extractDependency(raw: String): Dependency {
    val uniqueDependencyRegex = Regex.fromLiteral("(?<=\\+- ).*")
    val duplicateDependencyRegex = Regex.fromLiteral("(?<=\\().+?(?=\\))")
    val projectRegex = Regex.fromLiteral("(?<=\\[INFO\\] ).*")
when {
    uniqueDependencyRegex matches raw -> {
        val matchResult = uniqueDependencyRegex.matchEntire(raw)
        println(matchResult)
    }
    duplicateDependencyRegex matches raw -> {
        val matchResult = duplicateDependencyRegex.matchEntire(raw)
        println(matchResult)
    }
    projectRegex matches raw -> {
        val matchResult = projectRegex.matchEntire(raw)
        println(matchResult)
    }
    else -> {
        //TODO - throw exception
    }
}

return Dependency("test", "test", "test", "test")
}

I'm expecting it to work after I tested the regular expressions:

First Condition

Second Condition

Third Condition

The result I want is:

com.demo:communication:jar:3.5.0-SNAPSHOT
com.cellwize.optserver:optserver-admin:jar:3.5.0-SNAPSHOT:compile
org.apache.logging.log4j:log4j-api:jar:2.7:compile
org.apache.logging.log4j:log4j-core:jar:2.7:compile
org.apache.logging.log4j:log4j-api:jar:2.7:compile - omitted for duplicate
org.apache.logging.log4j:log4j-slf4j-impl:jar:2.7:compile
org.slf4j:slf4j-api:jar:1.7.21:compile
org.apache.logging.log4j:log4j-api:jar:2.7:compile - omitted for duplicate

Solution

  • You could either match [INFO] followed by a character class that will match any of the listed characters [| +\\(-], or match ) at the end of the string.

    In the replacement use an empty string.

    ^\[INFO\][| +\\(-]+|\)$
    

    With double escaped backslashes

    ^\\[INFO\\][| +\\\\(-]+|\\)$
    

    regex demo


    A bit more precise pattern could be repeatedly matching any of the occurring patterns like | or +- or \- and capture the content in group 1 between optional parenthesis. Then use the group in the replacement.

    ^\[INFO\](?:(?: +(?:\||\+-|\\-))+)? +\(?(.*?)\)?$
    

    Regex demo