Search code examples
pythonselenium-webdriverweb-scrapingbeautifulsoup

How to extract data from an interactive chart using Python? (Selenium + BeautifulSoup)


I need to extract data from the asset evolution chart present in this link (example): https://investidor10.com.br/carteira/572422/ (chart image attached).

I need the data for all the bars present in the chart: asset value, capital gain, and invested amount. I tried to extract using Selenium + BeautifulSoup, but I couldn't because the data is not present in the HTML and it only appears when you click on a bar of the chart. I searched the internet but couldn't find anything that helped me with this problem.

In summary, does anyone know how I can extract the data that appears in the asset evolution chart?

It doesn't necessarily need to be using Selenium + BeautifulSoup, but it needs to be in Python.

The chart from which I want to extract the data:

I tried using Selenium + BeautifulSoup but I don't know how to extract the data because it's dynamic and appears as you select a bar on the chart


Solution

  • No need to use selenium or beautifulsoup, in my opinion the easiest/directest way is to use the API through which the data is pulled.

    How to know if content is loaded / rendered dynamically in this case?

    First indicator, call up the website as a human in the browser and notice that a loading animation / delay appears for the area. Second indicator, the content is not included in the static response to the request. You can now use the browser's developer tools to look at the XHR Requests tab to see which data is being loaded from which resources. -> http://developer.chrome.com/docs/devtools/network

    If there is an api use it else go with selenium.

    Example
    import requests
    
    requests.get(
        'https://investidor10.com.br/api/carteiras/charts/evolucao-patrimonio/572422/12/all',
        headers={'user-agent':'some_valid_agent'}
    ).json()
    
    Result
    [{"month":"05\/23","date":"05\/23","sum_applied":3599.2496,"sum_equity":3794.7088999999996,"sum_flow":5198.9427,"profitability":0},{"month":"06\/23","date":"06\/23","sum_applied":4199.3396,"sum_equity":4621.3407,"sum_flow":6038.5586,"profitability":12.73},{"month":"07\/23","date":"07\/23","sum_applied":5163.1996,"sum_equity":5585.579299999999,"sum_flow":7031.7742,"profitability":10.32},{"month":"08\/23","date":"08\/23","sum_applied":7282.6224,"sum_equity":7601.287600000001,"sum_flow":9065.3382,"profitability":5.83},{"month":"09\/23","date":"09\/23","sum_applied":8304.412400000001,"sum_equity":8625.8636,"sum_flow":10053.2882,"profitability":5.14},{"month":"10\/23","date":"10\/23","sum_applied":8845.77940001,"sum_equity":8980.3758,"sum_flow":10872.9838,"profitability":2.68},{"month":"11\/23","date":"11\/23","sum_applied":10658.68980001,"sum_equity":11171.1518,"sum_flow":13193.6327,"profitability":5.8},{"month":"12\/23","date":"12\/23","sum_applied":12046.64240001,"sum_equity":13070.843799999999,"sum_flow":15134.4752,"profitability":9.41},{"month":"01\/24","date":"01\/24","sum_applied":13077.23640001,"sum_equity":13844.4296,"sum_flow":15645.7756,"profitability":6.68},{"month":"02\/24","date":"02\/24","sum_applied":14686.01640001,"sum_equity":15452.7688,"sum_flow":17294.9096,"profitability":5.94},{"month":"03\/24","date":"03\/24","sum_applied":16045.49640001,"sum_equity":16943.5274,"sum_flow":18794.0035,"profitability":6.26},{"month":"04\/24","date":"04\/24","sum_applied":17719.15640001,"sum_equity":17760.8627,"sum_flow":20053.741,"profitability":0.8},{"month":"05\/24","date":"05\/24","sum_applied":21831.56640001,"sum_equity":22332.2705,"sum_flow":24650.6796,"profitability":2.76}]