I am using Jsoup to scrap some data. In my document, I have something like:
<script type="text/javascript">
ta.store('mapsv2.geoName', 'Marseille');
ta.store('mapsv2.map_addressnotfound', 'Address not found'); ta.store('mapsv2.map_addressnotfound3', 'We couldn\'t find that location near {0}. Please try another search.'); </script>
<script type="text/javascript">
window.mapDivId = 'map0Div';
window.map0Div = {
lat: 43.295246,
lng: 5.364188,
zoom: null,
locId: 5039388,
geoId: 187253,
My code:
Document attractionDoc = Jsoup.connect(url).timeout(100000).get();
System.out.println("attractionDoc "+attractionDoc.toString());
But I don't know how to catch the number after lat: and lng:
Thanks for your help!
JSoup does not parse embedded Javascript, so there is no easy way of getting the object members lat
and lng
from the window.map0Div
object.
But as indicated by @Ceiling Gecko, you can parse the contents of the script tag with other techniques, e.g. regular expressions.
Assuming you have the script contents as a String called content
you may use something like:
Pattern p = Pattern.compile("window.map0Div\\s*=\\s*\\{.*lat:\\s*([0-9.]+),.*lng:\\s*([0-9.]+),");
Matcher m = p.matcher(content);
if (m.find()){
String lat = m.group(1);
String lng = m.group(2);
//do whatever you need to do with the info
}
Here is a fiddle with the regex: http://fiddle.re/1p0yd6