Search code examples
javajavascriptweb-scrapingjsoupscraper

JSOUP Scraping JavaScript piece Java


I am using Jsoup to scrap some data. In my document, I have something like:

  <script type="text/javascript">
ta.store('mapsv2.geoName', 'Marseille');
ta.store('mapsv2.map_addressnotfound', 'Address not found');         ta.store('mapsv2.map_addressnotfound3', 'We couldn\'t find that location near {0}.  Please try another search.');       </script> 
  <script type="text/javascript">
window.mapDivId = 'map0Div';
window.map0Div = {
lat: 43.295246,
lng: 5.364188,
zoom: null,
locId: 5039388,
geoId: 187253,

My code:

   Document attractionDoc = Jsoup.connect(url).timeout(100000).get();
   System.out.println("attractionDoc "+attractionDoc.toString());

But I don't know how to catch the number after lat: and lng:

Thanks for your help!


Solution

  • JSoup does not parse embedded Javascript, so there is no easy way of getting the object members lat and lng from the window.map0Div object.

    But as indicated by @Ceiling Gecko, you can parse the contents of the script tag with other techniques, e.g. regular expressions.

    Assuming you have the script contents as a String called content you may use something like:

    Pattern p = Pattern.compile("window.map0Div\\s*=\\s*\\{.*lat:\\s*([0-9.]+),.*lng:\\s*([0-9.]+),");
    Matcher m = p.matcher(content);
    if (m.find()){
        String lat = m.group(1);
        String lng = m.group(2);
        //do whatever you need to do with the info
    }
    

    Here is a fiddle with the regex: http://fiddle.re/1p0yd6