Search code examples
javascriptnode.jsscraper

How to manipulate default value retrieved from x-ray scraper (node.js)


This is my code:

var Xray = require('x-ray');  
var x = Xray();
x('http://someurl.com', 'tr td:nth-child(2)', [{  
    text: 'a',
    url: 'a@href'
  }]).write('results.json')

I need to populate the field named "text" only with the first word from each a tag. An example of a tag value:

"FirstWord SecondWord ThirdWord"

The actual result is text: FirstWord SecondWord ThirdWord

Desired result text: FirstWord

I can postprocess the result.json file but i don´t like that way.


Solution

  • There is a fork of x-ray library made by cbou
    It's custom x-ray API has a function prepare that can change the output
    https://github.com/cbou/x-ray#xrayprepare-str--fn

    Example:

    function uppercase(str) {
      return str.toUpperCase();
    }
    
    xray('mat.io')
    .prepare('uppercase', uppercase)
    .select('title | uppercase')
    .run(function(err, title) {
      // title == MAT.IO
    });