Search code examples
javascriptnode.jsreturnpuppeteerarrow-functions

Confusion around returning a variable declaration statement (return a = 2;) in Puppeteer/Node.js


I'm trying to learn Node.js and a bit of await/promise, and I come across this example (https://remarkablemark.org/blog/2018/04/15/puppeteer-without-async-await/):

const puppeteer = require('puppeteer');

let _browser;
let _page;

puppeteer
    .launch()
    .then(browser => (_browser = browser))
    .then(browser => (_page = browser.newPage()))
    .then(page => page.goto('https://example.com'))
    .then(() => _page)
    .then(page => page.screenshot({ path: 'example.png' }))
    .then(() => _browser.close());

I understand .then(browser => (_browser = browser)) is the same as .then(function(browser) { return _browser = browser; }), but I'm a bit confused on why some lines uses _browser and some uses browser (similarly with _page).

I also have no idea what .then(() => _page) line is for. Why wouldn't this line have a parameter which is what the previous function returned, ie page.goto(...)? I'm probably not understanding Puppeteer's .then() very well.

I'm trying to clean up this code and use promises and arrow functions:

puppeteer
    .launch()
    .then(function(browser) {
        return browser.newPage();
    })
    .then(function(page) {
        return page.goto(url).then(function() {
            return page.content();
        });
    })
    .then(function(html) {
       // do something
    })
    .catch(function(err) {
        //handle error
    });

Solution

  • When you assign a value to a variable, the return value of that assignment is the value that was assigned. For example:

    let x;
    const two = (x = 2); // x = 2 returns '2'
    console.log(two); // 2 (return value of x = 2)
    console.log(x); // 2 (x is set above to equal 2)

    Next, there a few intricacies to do with Promises and .then() that are important to note:

    1. The first is, .then() will return a new Promise that can resolve to a value. The value that that Promise resolves to can be obtained using .then(resolvedValue => ...).

    2. As previously mentioned, .then() returns a Promise that can resolve to a value. The value that the Promise resolves to is determined by what the function you pass to the .then() function returns. For example, if you had .then(() => xyz), then the Promise return by this .then() method call would be Promise<xyz>. Here is an example illustrating point 1 & 2 above:

    const promise = Promise.resolve(); // Promise<> (a promise that resolves to the value of undefined/nothing)
    
    const promiseAbc = promise.then(() => 'abc'); // returns Promise<'abc'> (a Promise that resolves to 'abc')
    promiseAbc.then(abc => console.log(abc)); // "extract" the resolved value from promiseAbc) (logs: 'abc')

    1. As mentioned in point 2, the returned value from the callback to .then() is used as the resolved value for the Promise returned by the .then() call. However, this works slightly different when you return a Promise from the .then() callback. Instead of returning a new Promise that resolves to a Promise, the .then() method returns the Promise that you returned from your .then() callback. Here is an example to clear this up:

    const promise = Promise.resolve(); // Promise<> (a promise that resolves to the value of undefined/nothing)
    const promiseAbc = Promise.resolve('abc'); // Promise<'abc'> (a promise that resolves to the value of 'abc')
    
    const newPromise = promise.then(() => promiseAbc);
    // You might expect that the above would return and set `newPromise` to be:
    // Promise<Promise<'abc'>>
    // But, as `promiseAbc` is a Promise, we return that instead, and so the above actually sets `newPromise` to be:
    // Promise<'abc'>
    newPromise.then(abc => console.log(abc)); // get the resolved value from `newPromise` and log it. We see it is 'abc', and not Promise<'abc'>

    As a result, the below code does a few things: ​

    ​.then(browser => (_browser = browser))
    
    1. The above will set the _browser variable equal to the browser value that comes from the resolved Promise returned by .launch()

    2. The arrow function will return the value browser (as _browser = browser will evaluate to browser)

    3. The .then() will return a new Promise, that resolves to the browser value. This happens because the value that you return from a .then()'s arrow function becomes the resolve value of the Promise that is returned by .then() (see point 2 above regarding .then's intracies. This means that the .then() call that comes directly after the .then() call that returns the Promise resolving to browser will be able to access it in its arrow function.

    The reason why your code is saving the value of browser and the Promise returned by browser.newPage() in variables is so that it can access them later on in your Promise chain at any arbitrary point.

    See code comments for an explanation of your code's evaluation process:

    // Promise<xyz> means that the Promise resolves to `xyz`, 
    // which you can access by using `.then(xyz => ...)` on the  Promise
    puppeteer
      .launch() // returns Promise<browser> <---------------------|
      .then(browser => (_browser = browser)) // sets _browser = browser, returns Promise<browser>
      .then(browser => (_page = browser.newPage())) // sets _page = browser.newPage() (_page is now a Promise), returns `Promise<page>`,  which is the promise returned `browser.newPage()` (see  point 3 of `.then()` intracies above) 
      .then(page => page.goto('https://example.com')) // gets `page` from the previously returned Promise. This returns Promise<HTTPResponse> (as page.goto() returns Promise<HTTPResponse>) - this return value is ignored, as we don't need to use the `HTTPResponse` 
      .then(() => _page) // returns the `_page` Promise, this is done so the next `.then()` can access the resolved value of the `_page` Promise. (see point 3 of the above `.then()` intracies)
      .then(page => page.screenshot({ path: 'example.png' })) // get the `page` value (which is the resolve value of the `_page` promise, returned by the above `.then()`), and return `Promise<Buffer>` - this value is ignored as it is not used in the next `.then()` call
      .then(() => _browser.close()); // return Promise<BrowserContext>
    
    

    Rather than doing all of this, using async/await as shown in the docs makes this more straightforward to follow:

    const puppeteer = require('puppeteer');
    (async () => {
      const browser = await puppeteer.launch();
      const page = await browser.newPage();
      await page.goto('https://example.com');
      await page.screenshot({path: 'screenshot.png'});
      await browser.close();
    })();