Search code examples
javascriptcheerio

Get text keeping any text-decoration styling


Currently using Cheerio I can get the text I need but I would also like to retain strikethrough text. (if easier I can just wrap around the ~~ characters as the text will end up in a discord embed)

My Current Code is as follows:

const cheerio = require('cheerio');
const fetch = (...args) => import('node-fetch').then(({default: fetch}) => fetch(...args));
    
async function scrapeInformation() {
  const response = await fetch('https://scp-wiki.wikidot.com/personnel-and-character-dossier');
  const body = await response.text();
    
  const $ = cheerio.load(body);
  const titleList = [];
    
  const wikiTab = $('#wiki-tab-0-6 > p');
    
  wikiTab.each((i, title) => {
      const titleNode = $(title);
      console.log(titleNode.find("span").attr("style"));
      const titleText = titleNode.text();
    
      titleList.push(titleText);
  });
}
    
scrapeInformation();

What I expect for the text with strikethrough is:

Anartist musician, co-founder of the "House of Spades" rock band, partially responsible for the creation of SCP-952. In custody following two failed suicide attempts, designated D-952. Missing, presumed deceased.

Website for reference: https://scp-wiki.wikidot.com/personnel-and-character-dossier (D-Class Tab)

Some HTML from the website:

<div id="wiki-tab-0-6" style="display: block;">
   <p>
      <strong>D-952 (formerly Veronica Fitzroy):</strong> Anartist musician, co-founder of the "House of Spades"rock band, partially responsible for the creation of <span style="text-decoration: line-through;">In custody following two failed suicide attempts, designated D-952.</span> Missing, presumed deceased.
   </p>
</div>

Any documentation or articles that could further my knowledge would be appreciated.


Solution

  • After some changes and testing I have created the solution below:

    wikiTab.each((i, title) => {
       const titleNode = $(title);
       let titleText = titleNode.text();
    
       if (titleNode.find("span").attr("style")) {
          const span = $(titleNode.find("span"));
          const node = $(span);
    
          const replace = "~~" + node.text() + "~~";
          titleText = titleText.replace(node.text(), replace);
       }
    
       titleList.push(titleText);
    });
    

    What I have done is checked if the titleNode had a span element with the style attribute then did some stuff I honestly don't understand then replaced part of the already obtained text with text that would be strikethrough compatible with discord.

    Any comments that could provide advice, articles and documentation would still be greatly appreciated so I can understand what I am actually doing.