Search code examples
javascripthtmlnode.jstagsextract

How can I find HTML like tags in a string using Javascript?


I have the following string:

var originalStr = "Test example <firstTag>text inside first tag</firstTag>, <secondTag>50</secondTag> end."

What's the best way to identify all tags, the correspondent tag name and their content? This is the kind of result I'm looking for.


var tagsFound = 
    [ { "tagName": "firstTag",  "value": "text inside first tag" } 
    , { "tagName": "secondTag", "value": "50" } 
    ] 

Solution

  • Depending on complexity of strings you dealing with - the simple regEx solution might work (it works for your string nicely:

    var str = 'Test example <firstTag>text inside first tag</firstTag>, <secondTag>50</secondTag> end.';
    
    var tagsFound = [];
    str.replace(/<([a-zA-Z][a-zA-Z0-9_-]*)\b[^>]*>(.*?)<\/\1>/g, function(m,m1,m2){
        // write data to result objcect
        tagsFound.push({
            "tagName": m1,
            "value": m2
        })
        // replace with original = do nothing with string
        return m;
    });
    
    // Displaying the results
    for(var i=0;i<tagsFound.length; i++){
        console.log(tagsFound[i]);
    }

    There will be a problem when self closing tags or tags containing other tags are taken into accont. Like <selfClosedTag/> or <tag><tag>something</tag>else</tag>