Search code examples
pythonpandasdataframeweb-scrapingdata-analysis

Create Pandas Dataframe from WebScraping results of stock price


I'm trying to write an script which creates an Pandas Dataframe (df) an add every time x an stock price to the df. The Data is from wrebscraping.

This is my code, but I have no idea how to add every time x (e.g. 1min) new data to the df and not replace the old.

    import bs4 as bs
    import urllib.request
    import time as t
    import re
    import pandas as pd
    
    i = 1
    while i == 1:
        # !! Scraping
    
        link = 'https://www.onvista.de/aktien/DELIVERY-HERO-SE-Aktie-DE000A2E4K43'
    
        parser = urllib.request.urlopen(link).read()
        soup = bs.BeautifulSoup(parser, 'lxml')
    
        stock_data = soup.find('ul', {'class': 'KURSDATEN'})
        stock_price_eur_temp = stock_data.find('li')
        stock_price_eur = stock_price_eur_temp.get_text()
    
        final_stock_price = re.sub('[EUR]','', stock_price_eur)
        print (final_stock_price)
        t.sleep(60)
    
        # !! Building an dataframe 
    
        localtime = t.asctime(t.localtime(t.time()))
      
        stock_data_b = {
        'Price': [final_stock_price],
        'Time': [localtime],
                    }
    
        df = pd.DataFrame(stock_data_b, columns=['Price', 'Time'])
   

I hope you can help me with an idea for this problem.


Solution

  • Because you create df inside the loop, you're re-writing that variable name each time, writing over the data from the previous iteration. You want to initialize a dataframe before the loop, and then add to it each time.

    Before your loop, add the line

    df2 = pd.DataFrame()
    

    which just creates an empty dataframe. After the end of the code you posted, add

    df2 = df2.append(df, ignore_index = True)
    

    which will tack each new df on to the end of df2.