I am struggling to get the Absolute paths for the images that I am scraping from my website. I have looked at the documentation on jsoup.org but I cannot get the abs:src to work. I don't know how to implement the abs:src or where to add it.
<cfhttp method="get" url="https://theculturecook.com/recipe-slowroasted-pork-belly.html" result="theresult">
Jsoup = createObject("java", "org.jsoup.Jsoup");
html = "#theresult.filecontent#";
doc = Jsoup.parse(html);
tags = doc.select("img[src$=.jpg]");
<cfset images = "">
<cfloop index="e" array="#tags#">
<cfset images = ListAppend(images,#e.attr("src")#)>
<cfloop list="#images#" index="a">
The issue you are facing is that you are passing html content to JSOUP. If you need absolute paths, then you need to use to following to connect.
So finally,
Jsoup = createObject("java", "org.jsoup.Jsoup");
doc = Jsoup.connect("https://theculturecook.com/recipe-slowroasted-pork-belly.html").get();
tags = doc.select("img[src$=.jpg]");
<!--- <cfdump var="#a.attr()#" abort> --->
<cfset images = "">
<cfloop index="e" array="#tags#">
<cfset images = ListAppend(images, e.attr("abs:src"))>
<cfloop list="#images#" index="a">