Search code examples
javascripthtmlnode.jstext-processingtext-parsing

How to remove all attributes from html?


I have raw html with some css classes inside for various tags.

Example:

Input:

<p class="opener" itemprop="description">Lorem ipsum dolor sit amet, consectetur adipisicing elit. Neque molestias natus iste labore a accusamus dolorum vel.</p>

and I would like to get just plain html like:

Output:

<p>Lorem ipsum dolor sit amet, consectetur adipisicing elit. Neque molestias natus iste labore a accusamus dolorum vel.</p>

I do not know names of these classes. I need to do this in JavaScript (node.js).

Any idea?


Solution

  • This can be done with Cheerio, as I noted in the comments.
    To remove all attributes on all elements, you'd do:

    var html = '<p class="opener" itemprop="description">Lorem ipsum dolor sit amet, consectetur adipisicing elit. Neque molestias natus iste labore a accusamus dolorum vel.</p>';
    
    var $ = cheerio.load(html);   // load the HTML
    
    $('*').each(function() {      // iterate over all elements
        this.attribs = {};     // remove all attributes
    });
    
    var html = $.html();          // get the HTML back