javascript cookies web-scraping headless puppeteer

How to manage log in session through headless chrome?

I want to create a scraper that:

opens a headless browser,
goes to a url,
logs in (there is steam oauth),
fills some inputs,
and clicks 2 buttons.

My problem is that every new instance of headless browser clears my login session, and then I need to login again and again...

How to save it through instances? (using puppeteer with headless chrome)

Or how can I open already logged in chrome headless instance? (if I have already logged in in my main chrome window)

Solution

In puppeter you have access to the session cookies through page.cookies().

So once you log in, you could get every cookie and save it in a json file:

const fs = require(fs);
const cookiesFilePath = 'cookies.json';
// Save Session Cookies
const cookiesObject = await page.cookies()
// Write cookies to temp file to be used in other profile pages
fs.writeFile(cookiesFilePath, JSON.stringify(cookiesObject),
 function(err) { 
  if (err) {
  console.log('The file could not be written.', err)
  }
  console.log('Session has been successfully saved')
})

Then, on your next iteration right before using page.goto() you can call page.setCookie() to load the cookies from the file one by one:

const previousSession = fs.existsSync(cookiesFilePath)
if (previousSession) {
  // If file exist load the cookies
  const cookiesString = fs.readFileSync(cookiesFilePath);
  const parsedCookies = JSON.parse(cookiesString);
  if (parsedCookies.length !== 0) {
    for (let cookie of parsedCookies) {
      await page.setCookie(cookie)
    }
    console.log('Session has been loaded in the browser')
  }
}

Checkout the docs: