Search code examples
node.jsweb-scrapingaxioscheerioattr

Undefined 'href' attribute/jQuery


So I recently found out about webscraping using axios and cheerio. I thought it would be cool if I wrote a program in javascript that would get the download links of all the episodes of a series automatically. The basic tasks of the program are to find each episode's link, get the download page's link and then get the link of the 'download in 1080p' button. However the link of the final 'download' button, which I use .attr('href') to find, 'is undefined'. Here's my code:

const fs = require("fs");
const axios = require("axios");
const cheerio = require("cheerio");
const epNum = 2; // Number of episodes to be downloaded.

async function main(){

    for(var i = 1; i <= epNum; i++){
        // Get the link of the first (yellow) download button (it leads to the download page).
        const res = await axios.get(`https://gogoanime.hu//bleach-episode-${i}`);
        const $ = cheerio.load(res.data);

        const downloadBtn = $("i.icongec-dowload").parent();
        const downloadPage = downloadBtn.attr("href");

        // Get the link of the last download button 'Download 1080p' (it Downloads an mp4).
        const secondRes = await axios.get(downloadPage);
        const $new = await cheerio.load(secondRes.data);

        const download = $new('#content-download > div:nth-child(1) > div:nth-child(6) > a').attr('href'); // This is where the problem is.
        console.log(download);
    }
}

main();

Output:

Undefined

I tried using .find() and .prop() as mentioned in other posts, but the problem remained. Then I thought it could be because there are more than one a elements in the html code, however I provide the selector path I got from the page. I tried many more suggestions, but nothing has come out of it so far. All the modules are working properly. -By the way, I DO NOT support downloading series from this site and I only use the service for educational purposes-. Any help is extremely appreciated, thanks :)


Solution

  • If you see the source pagem div#content-download is empty and filled by JS below.

    <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.6.0/jquery.min.js"></script>
    <script src="https://www.google.com/recaptcha/api.js?render=6LealdkbAAAAAHbox4XlHS8ZMQ6lkcx96WV62UfO"></script>
    
    <script>
    
    grecaptcha.ready(function() {
        grecaptcha.execute('6LealdkbAAAAAHbox4XlHS8ZMQ6lkcx96WV62UfO').then(function(token) {
            $.post(window.location.pathname, {
                captcha_v3: token,
                id: 'MTEwNjk='
            }, function(data, status) {
                $('#content-download').html(data);
            });
        });
    });
    </script>
    

    You can try sending post request and get data.