Hi am using the HTTP Client
step to get the source code of a website. I need to scrape out a particular part of one line.
example line: <a href="....." ......>TEXT I WANT</a>
so I figured I would use a UDJC in PDI and first split the text block into lines with String[] lines = code.split("\n+");
and then loop through the array and with an if condition (i.e. the regex check) see if I have the right line.
for(String line : lines){
if line.matches(".*a href.*"){
String outputString = code;
break;
}
}
(I am trying this also in an IDE as pure java without PDI) I never get a hit though. Any idea how to fix this? Or is there a faster and easier way to get the chunk I want?
I do something like you want to in a similar case with a filter-step
Transformation-Steps:
"<a href"
// check the output