Search code examples
pythonpandasdataframenumpygoogle-colaboratory

Converting ordinary data into time seris dataframes panda Python


I have a small problem concerning conversion of data to time series. Here are the steps that i carried out. I have the output data as follows : Beautiful Soup is a library that makes it easy to scrape information from web pages. It sits atop an HTML or XML parser, providing Pythonic idioms for iterating, searching, and modifying the parse tree.

url1 = 'http://financials.morningstar.com/finan/financials/getFinancePart.html?&callback=xxx&t=BBRI'
url2 = 'http://financials.morningstar.com/finan/financials/getKeyStatPart.html?&callback=xxx&t=BBRI'

soup1 = BeautifulSoup(json.loads(re.findall(r'xxx\((.*)\)', requests.get(url1).text)[0])['componentData'], 'lxml')
soup2 = BeautifulSoup(json.loads(re.findall(r'xxx\((.*)\)', requests.get(url2).text)[0])['componentData'], 'lxml')

def print_table(soup):
    for i, tr in enumerate(soup.select('tr')):
        row_data = [td.text for td in tr.select('th, td') if td.text]
        if not row_data:
            continue
        if len(row_data) < 12:
            row_data = ['X'] + row_data
        for j, td in enumerate(row_data):
            if j==0:
                print('{: >30}'.format(td))
            else:
                print('{: ^12}'.format(td))
        print()


print_table(soup1)

produce output

          X
  2010-12   
  2011-12   
  2012-12   
  2013-12   
  2014-12   
  2015-12   
  2016-12   
  2017-12   
  2018-12   
  2019-12   
    TTM     

               Revenue IDR Mil
 30,552,600 
 40,203,051 
 43,104,711 
 51,133,344 
 59,556,636 
 69,813,152 
 82,504,537 
 90,844,308 
 99,067,098 
108,468,320 
105,847,159 

I need to convert it to a dataframe with panda being to:

data

   X        Revenue IDR Mil
  2010-12        30,552,600 
  2011-12        40,203,051 
  2012-12        43,104,711
  2013-12        51,133,344    
  2014-12        59,556,636    
  2015-12        69,813,152  
  2016-12        82,504,537   
  2017-12        90,844,308 
  2018-12        99,067,098   
  2019-12        108,468,320   
  2020-12        105,847,159     

Solution

  • This is a bit simplified from what you are doing, but I think it gets you where you need, mostly from Bitto Bennichan,

    import json
    import pandas as pd
    
    url1 = 'http://financials.morningstar.com/finan/financials/getFinancePart.html?t=BBRI'
    url2 = 'http://financials.morningstar.com/finan/financials/getKeyStatPart.html?t=BBRI'
    
    lm_json = requests.get(url1).json()
    df_list=pd.read_html(lm_json["componentData"])
    df_list[0].transpose()