Search code examples
web-scrapingtableau-api

Web scraping India Energy Dashboard data


I am trying to web scrape India Energy Dashboard (https://www.niti.gov.in/edm/#elecGeneration) data using Python. Then, when I click on download, the website returns error NET::ERR_CERT_DATE_INVALID. I guess, because of this I am not getting response 200 message. I tried with Tableauscraper library too, but I am getting error NoneType has no attribute text. I am writing the following code:

#!pip install TableauScraper

from tableauscraper import TableauScraper as TS

url = "https://public.tableau.com/app/profile/niti.energy.vertical/viz/ElectricityGeneration_0/Source"

ts = TS()

ts.loads(url)

Solution

  • You need to inspect the Network tab in your browser's Dev tools, and get the correct url for the data source. Here is one way to obtain that data:

    from tableauscraper import TableauScraper as TS
    
    url = 'https://public.tableau.com/views/ElectricityGeneration_0/Source?%3Adisplay_static_image=y&%3AbootstrapWhenNotified=true&%3Aembed=true&%3Alanguage=en-US&:embed=y&:showVizHome=n&:apiID=host0'
    
    ts = TS()
    ts.loads(url)
    workbook = ts.getWorkbook()
    
    for t in workbook.worksheets:
        print(f"worksheet name : {t.name}") #show worksheet name
        print(t.data) #show dataframe for this worksheet
    

    Result in terminal:

    worksheet name : Generation Trend by Source 
    Year Name-value Year Name-alias Year Name-[federated.0hpknup10wcqib1b9qd9s1xn749g].[none:YearName:nk]-value Year Name-[federated.0hpknup10wcqib1b9qd9s1xn749g].[none:YearName:nk]-alias SUM(Generation TWh)-value   SUM(Generation TWh)-alias   SUM(Generation TWh)-[federated.0hpknup10wcqib1b9qd9s1xn749g].[sum:Calculation_3893502658938839040:qk]-value SUM(Generation TWh)-[federated.0hpknup10wcqib1b9qd9s1xn749g].[sum:Calculation_3893502658938839040:qk]-alias Energy Source-alias
    0   FY06    FY06    FY06    FY06    697.06083   697.06083   0.22062 0.22062 WIND
    1   FY07    FY07    FY07    FY07    751.53005   751.53005   0.21588 0.21588 WIND
    2   FY08    FY08    FY08    FY08    809.263687  809.263687  11.065371   11.065371   WIND
    3   FY09    FY09    FY09    FY09    838.682997  838.682997  13.19954    13.19954    WIND
    4   FY10    FY10    FY10    FY10    898.527489  898.527489  15.171851   15.171851   WIND
    ... ... ... ... ... ... ... ... ... ...
    150 0   0   FY16    FY16    0   0   16.680499   16.680499   BIOMASS-BAGASSE
    151 0   0   FY17    FY17    0   0   14.15864    14.15864    BIOMASS-BAGASSE
    152 0   0   FY18    FY18    0   0   15.2523 15.2523 BIOMASS-BAGASSE
    153 0   0   FY19    FY19    0   0   16.326489   16.326489   BIOMASS-BAGASSE
    154 0   0   FY20    FY20    0   0   13.742429   13.742429   BIOMASS-BAGASSE
    155 rows × 9 columns
    
    worksheet name : Generation by Source in
    Energy Source-alias SUM(Generation TWh)-alias   SUM(Generation GWh)-alias   SUM(Generation GWh)-[federated.0hpknup10wcqib1b9qd9s1xn749g].[pcto:sum:Generation_GWh:qk]-alias
    0   NUCLEAR 46.47245    46,472MWh   0.028636
    1   HYDRO   156.117158  156,117MWh  0.096197
    2   COAL    1199.742768 1,199,743MWh    0.739262
    3   BIOMASS-BAGASSE 13.742429   13,742MWh   0.008468
    4   DIESEL  2.027548    2,028MWh    0.001249
    5   NATURAL GAS 73.885792   73,886MWh   0.045527
    6   RENEWABLES  0.365895    366MWh  0.000225
    7   SMALL HYDRO 9.451229    9,451MWh    0.005824
    8   SOLAR   51.938299   51,938MWh   0.032004
    9   WIND    69.149642   69,150MWh   0.042609
    

    For documentation, please see https://github.com/bertrandmartel/tableau-scraping