Search code examples
javascripthtmlgetelementbyidgetelementsbyclassname

Access a page's HTML


Is it possible to take a link and access its HTML code through that link? For example I would like to take a link from Amazon and put it within my own HTML code, use JavaScript to getElementsByClassName to get the price from that link and display it back into my HTML code.


Solution

  • It is possible. You could do a GET request to the Amazon page that will give you the html in the response from there you'll have a string now you'll need to format it, last time I used the node module jsdom to do that.

    In more detail:

    HTTP is a protocol that we use to request data from the server, I've wrote an explanatory node js script:

    const https = require('https');
    const JSD = require('jsdom');
    const { JSDOM } = JSD;
    const zlib = require('zlib');
    
    // The http get request
    https.get('https://www.amazon.com', (response) => {
      html = '';
    
      // we need this because amazon is tricky and encodes the response so it is smaller hence it is faster to send
      let gunzip = zlib.createGunzip();
      response.pipe(gunzip);
    
      // we need this to get the full html page since it is too big to send in one amazon divides it to chunks
      gunzip.on('data', (chunk) => {
        html += chunk.toString();
      });
    
      // when the transmittion finished we can do wathever we want with it
      gunzip.on('end', () => {
        let amazon = new JSDOM(html);
        console.log(amazon.window.document.querySelector('html').innerHTML);
      });
    });