Search code examples
regexgoparsingmultiline

Golang multiline regexp parsing issue


I am creating a project in Go that parses Solidity code. In my project, I created a function analyzeFile() which for each smart contract (.sol) will detect statically issues with regexp:

func analyzeFile(issues []Issue, file string) (map[string][]Finding, error) {
    findings := make(map[string][]Finding)
    readFile, err := os.Open(file)
    if err != nil {
        return nil, err
    }
    defer readFile.Close()
    contents, _ := ioutil.ReadFile(file)
    scanner := bufio.NewScanner(readFile)
    lineNumber := 0
    for scanner.Scan() {
        lineNumber++
        line := scanner.Text()
        for _, issue := range issues {
            if issue.ParsingMode == "SingleLine" {
                matched, _ := regexp.MatchString(issue.Pattern, line)
                if matched {
                    findings[issue.Identifier] = append(findings[issue.Identifier], Finding{
                        IssueIdentifier: issue.Identifier,
                        File:            file,
                        LineNumber:      lineNumber,
                        LineContent:     strings.TrimSpace(line),
                    })
                }
            }
        }
    }

When the regexes have to control the code on a single line, everything is fine. However, I also need to check things in the .sol files that occur on multiple lines, for instance detect this piece of code:

require(
  _disputeID < disputeCount &&
  disputes[_disputeID].status == Status.Active,
  "Disputes::!Resolvable"
);

I tried to add the following code in the analyzeFile() function:

 contents, _ := ioutil.ReadFile(file)
    for _, issue := range issues {
        if issue.ParsingMode == "MultiLine" {
            contents_to_string := string(contents)
            //s := strings.ReplaceAll(contents_to_string, "\n", " ")
            //sr := strings.ReplaceAll(s, "\r", " ")
            r := regexp.MustCompile(`((require)([(])\n.*[&&](?s)(.*?)([;]))`)
            finds := r.FindStringSubmatch(contents_to_string)
            for _, find := range finds {
                findings[issue.Identifier] = append(findings[issue.Identifier], Finding{
                    IssueIdentifier: issue.Identifier,
                    File:            file,
                    LineContent:     (find),
                })
            }
        }
    }

But I get wrong results because when transforming the source code to string, I get all the code on one line with line break \n character which makes any regex check crash.


Solution

  • One word around solution could split the whole string with multiline with \n after caputer group (?s)require\((.*?)\);

    
    func main() {
        var re = regexp.MustCompile(`(?s)require\((.*?)\);`)
        var str = `require(
      _disputeID < disputeCount &&
      disputes[_disputeID].status == Status.Active,
      "Disputes::!Resolvable"
    );`
    
        matches := re.FindAllStringSubmatch(str, -1)
        for _, match := range matches {
            lines := strings.Split(match[1], "\n")
            for _, line := range lines {
                fmt.Println(line)
            }
        }
    }
    

    https://go.dev/play/p/Omn5ULHun_-


    In order to match multiple lines, the (?m)^[^\S\r\n]*(.*)[^\S\r\n](\S+)$ could be used. We could do the multiline matching to the content between require( and )

    func main() {
        var re = regexp.MustCompile(`(?s)require\((.*?)\);`)
        var str = `require(
      _disputeID < disputeCount &&
      disputes[_disputeID].status == Status.Active,
      "Disputes::!Resolvable"
    );`
    
        var multilineRe = regexp.MustCompile(`(?m)^[^\S\r\n]*(.*)[^\S\r\n](\S+)$`)
        matches := re.FindAllStringSubmatch(str, -1)
        for _, match := range matches {
            submathes := multilineRe.FindAllStringSubmatch(match[1], -1)
            for _, submatch := range submathes {
                fmt.Println(submatch[0])
            }
        }
    }
    

    https://go.dev/play/p/LJsVy5vN6Ej