Search code examples
javajsouphtml-parsing

How to extract data if the div class comes after an id?


I try to get some data from div which is embedded after an ID and type=hidden. I cannot reach the class to get the links listed in that class.

I am using Jsoup with Elements and .select() or .getElementsbyId() and tried to combine them to reach the class. Without success. The site is https://www.ariva.de/aktien/suche. If you hit the search "Suche starten" button the result table pops up. In this table the links are what I want to reach.

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;


public class DatenImportUnternehmen {

 public static void main (String[] args) {

  String url = "https://www.ariva.de/aktien/suche";


  try {

   Document document = Jsoup.connect(url).get();

   for (Element row : document.select("div.aktiensuche_result_table")) {
    if(row.select("input[type=hidden]").text().equals("")) {
     continue;
    }
    else {
     String raw = row.select("[type=hidden]").text();
     System.out.println(raw);

   }
   }
   }
   catch (Exception ex) {
    ex.printStackTrace();
   }
}       

I don't get any result. Eclipse just states terminated.


Solution

  • If I understand correctly you want to get to the links in the table generated when you hit the search button on https://www.ariva.de/aktien/suche.

    The first problem you are having is that the search results aren't available directly from this URL. Instead when you click the search button a POST request is made to https://www.ariva.de/aktiensuche/_result_table.m The result of this request actually contains the table with the links that I believe you are interested in. Specifically the response contains HTML which is then dynamically added to the page as the results table.

    The second problem looks to be in the jsoup query. I can't see any hidden input fields in the result table, but it is easy enough to grab the links using document.select("a[href]").

    So for me this code:

    String searchUrl = "https://www.ariva.de/aktiensuche/_result_table.m";
    String searchBody = "page=0&page_size=25&sort=ariva_name&sort_d=asc&ariva_performance_1_year=_&ariva_performance_3_years=&ariva_performance_5_years=&index=0&founding_year=&land=0&industrial_sector=0&sector=0&currency=0&type_of_share=0&year=_all_years&sales=_&profit_loss=&sum_assets=&sum_liabilities=&number_of_shares=&earnings_per_share=&dividend_per_share=&turnover_per_share=&book_value_per_share=&cashflow_per_share=&balance_sheet_total_per_share=&number_of_employees=&turnover_per_employee=_&profit_per_employee=&kgv=_&kuv=_&kbv=_&dividend_yield=_&return_on_sales=_";
    
    // post request to search URL
    Document document = Jsoup.connect(searchUrl).requestBody(searchBody).post();
    
    // find links in returned HTML
    for(Element link:document.select("a[href]")) {
        System.out.println(link);
    }
    

    produces the output:

    <a href="/1-1_drillisch-aktie">1&amp;1 Drillisch</a>
    <a href="/11_88_0_solutions-aktie">11 88 0 Solutions</a>
    <a href="/1st_red-aktie">1st Red</a>
    <a href="/21st-_cent-_fox_b_new-aktie">21ST. CENT. FOX B NEW</a>
    <a href="/21st_century_fox-aktie">21st Century Fox</a>
    <a href="/2g_energy-aktie">2G Energy</a>
    <a href="/3i_group-aktie">3I Group</a>
    <a href="/3i_infrastructure-aktie">3I INFRASTRUCTURE</a>
    <a href="/3m_company-aktie">3M Company</a>
    <a href="/3u_holding-aktie">3U Holding</a>
    <a href="/3w_power-aktie">3W Power</a>
    <a href="/4imprint_group-aktie">4imprint Group</a>
    <a href="/4_sc-aktie">4 SC</a>
    <a href="/XS0421565150">6,625% Statkraft AS 09/19 auf Festzins</a>
    <a href="/7c_solarparken-aktie">7C Solarparken</a>
    <a href="/888_holdings-aktie">888 Holdings</a>
    <a href="/a-a-a-_aktiengesellschaft_allgemeine_anlageverwaltung-aktie">A.A.A. aktiengesellschaft allgemeine anlageverwaltung</a>
    <a href="/a-g-_barr_______ls-04167-aktie">A.G. BARR LS-,04167</a>
    <a href="/a-h-t-_syngas_technology-aktie">A.H.T. Syngas Technology</a>
    <a href="/a-s-_creation_tapeten-aktie">A.S. Creation Tapeten</a>
    <a href="/a-j_mucklow_group-1-aktie">A+J Mucklow Group</a>
    <a href="/a-jmucklow_grp_pref-_ls_1-aktie">A+JMUCKLOW GRP PREF. LS 1</a>
    <a href="/a2a-aktie">A2A</a>
    <a href="/aac_technologies_holding-aktie">AAC Technologies Holding</a>
    <a href="/aalberts-aktie">Aalberts</a>
    

    Which I hope is more or less what you are after. To set search parameters you will need to examine the search form and modify the form data in the searchBody string (or use the .data method instead of .requestBody to build the query).