Search code examples
javascriptregexxregexp

XRegexP.matchRecursive - add callback functionality to allow for multiple identical instances


I am using XRegexP to parse a text file specifically to find the contents between two sets of pre-defined comment tags, I'm not able to change these tags so I need to find a way to make it work with the text provided.

I find a list of all of the tags using the regex provided (example in link also includes sample content): https://regex101.com/r/kCwyok/1/

I've then used XRegexP's matchRecursive function to get all the content in between the opening and closing tags which all works - almost - perfectly.

// Map the list of component tags and extract data from them
return generateComponentList(data).map((component) => {
    console.log(chalk.blue('Processing', component[1], 'component.'))
    const contents = XRegExp.matchRecursive(data, '<!-- @\\[' + component[1] + '\\][.\\w-_+]* -->', '<!-- @\\[/' + component[1] + '\\] -->', 'g')
    let body = ''
    let classes = ''

    contents.map((content) => {
      const filteredContent = filterContent(content)
      body = filteredContent.value
      classes = cleanClasses(component[2])
      console.log(chalk.green(component[1], 'processing complete.'))
    })

    // Output the content as a JSON object
    return {
      componentName: component[1],
      classes,
      body
    }
  })

The problem I have is that the CodeExample tag exists twice, the tag is identical but the content is different, however, because matchRecursive doesn't appear to have a callback function, it just runs the match on all instances of that component at the same time so it doesn't matter if there are 1 or 10 instances of CodeExample the content for all of them is returned.

Is there a way I CAN actually add some sort of callback to matchRecursive? Failing that is there a way I can make JavaScript understand which instance of CodeExample is being looked at so I can just reference the array position directly? I presume XRegexP has an idea of which number CodeExample tag it's looking at, so is there a way to capture it?

Here is the full code for sake of clarity: https://pastebin.com/2MpdvdNA

The desired output I want is a JSON file with the following data:

[
{
 componentName: "hero",
 classes: "",
 body: "# Creating new contexts"
},
{
 componentName: "CodeExample",
 classes: "",
 body: "## Usage example

    ```javascript
      Import { ICON_NAME } from 'Icons'
    ```"
},
{
 componentName: "ArticleSection",
 classes: "",
 body: // This section is massive and not relevant to question so skipping
},
{
 componentName: "NoteBlock",
 classes: ["warning"],
 body: "> #### Be Careful
> Eu laboris eiusmod ut exercitation minim laboris ipsum magna consectetur est [commodo](/nope)."
},
{
 componentName: "CodeExample",
 classes: "",
 body: "#### Code example
```javascript
  class ScrollingList extends React.Component {
      constructor(props) {
        super(props);
        this.listRef = React.createRef();
      }

      render() {
        return (
          &#60;div ref={this.listRef}&#62;{/* ...contents... */}&#60;/div&#62;
        );
      }
    }
```"
}
// Skipping the rest as not relevant to question
]

Sorry if I've not explained this clearly, I've been looking at this for far too long.


Solution

  • This is how it was resolved in the end:

    import XRegExp from 'xregexp'
    
    const extractComponents = data => {
      const components = []
      const re = '<!-- @\\[(\\w+)\\]([.\\w-_+]+)* -->'
    
      XRegExp.forEach(data, XRegExp(re, 'g'), match => {
        const name = match[1]
        const classes = match[2]
    
        const count = components.filter(item => item.name === name).length
        const instance = count ? count : 0
    
        components.push({
          name,
          classes,
          instance
        })
      })
    
      return components
    }
    
    const cleanClasses = classes => {
      const filteredClasses = classes ? classes.split('.') : []
      filteredClasses.shift()
    
      return filteredClasses
    }
    
    const extractContent = (data, component) => {
      const re = `<!-- @\\[${component.name}\\][.\\w-_+]* -->`
      const re2 = `<!-- @\\[/${component.name}\\] -->`
    
      return XRegExp.matchRecursive(
        data, 
        re, re2, 'g'
      )[component.instance]
    }
    
    const parseComponents = data => {
      return extractComponents(data).map(component => {
        return {
          componentName: component.name,
          classes: cleanClasses(component.classes),
          body: extractContent(data, component)
        }
      })
    }
    
    export default parseComponents