Search code examples
javascriptpythonregexcucumbergherkin

Extracting code using RegEx and Python from a JavaScript function


I'm currently parsing some Gherkin files along with their associated step definition files. I'm wondering what the best way would be to extract the RegEx inside the step along with the code would be. For example, I have the following functions:

this.Given(/^I create an SNS topic with name "([^"]*)"$/, function(name, callback) {
    var world = this;
    this.request(null, 'createTopic', {Name: name}, callback, function (resp) {
      world.topicArn = resp.data.TopicArn;
    });
  });

  this.Given(/^I list the SNS topics$/, function(callback) {
    this.request(null, 'listTopics', {}, callback);
  });

I want to extract both the regex ^I create an SNS topic with name "([^"]*)"$ and function code:

    var world = this;
    this.request(null, 'createTopic', {Name: name}, callback, function (resp) {
      world.topicArn = resp.data.TopicArn;
    });

I've been able to extract the regex using the following regex: 'this.(?:Given|Then|When)(/(.+?)/'

However, extracting the function code is a lot more tricky. How can I specify to extract everything from the first { to the last } for the function? Is there a better way to do this i.e. a library that automatically can extract it?


Solution

  • Regular expressions are not suited to parse correctly general prorgrams(1). You should use a javascript parser instead.

    Another way would be to choose a proxy; for example:

    • you can split your file in line chunks starting each with this.Given(,
    • keep whatever lies between that this.Given( and the last }); you see in the chunk as the "function body"

    this simplistic approach has some obvious blind spots (that's why I called it "a proxy"):
    it won't work if you happen to have nested this.Given( statements, it would incorrectly catch a final }); in a comment line, it would incorrectly include the code from another function declaration (if you happen to have some that are declared between two this.Given( statements), ...

    but if your code has a regular structure this may be quicker to implement than using a complete javascript parser.


    (1) : programming languages generally are in the "context free" or "context sensitive" language classes, while regular expressions can only parse "regular" languages