Search code examples
javahtmlregexweb-scrapingdata-processing

web scraping and data processing in java


I am writing a web scraper program to extract stock quotes from yahoo finance,google finance or nasdaq. I can get the html element containing the stock prices but I only need the dollar value from the result. For example the sample output looks like the image below: enter image description here

I am using an image here because when I posted the actual html, only the dollar amounts (the desired results) showed up, the html entities and tags vanished. Here is my code enter image description here I am not very familiar with regEx but I tried it but no luck. How can I extract only the dollar amount from the output?


Solution

  • Try using java.util.regex.Matcher and java.util.regex.Pattern:

    String pattern = "<td>\\$&.+;(\\d{1,4}\\.\\d{2})&";
    Pattern p = Pattern.compile(pattern);
    Matcher m = p.matcher(inputLine);
    
    if (m.find( )) {
         System.out.println("Price: $" + m.group(1) );
    }
    

    Result:

    Price: $130.27 ...

    Example:

    http://ideone.com/fWgvL5#stdout