Search code examples
pandasdataframeweb-scrapingexport-to-csv

scraping a table from website using pandas and saving to csv file


Iam new to python, I scraping a table from website using pandas and saving it as a csv file and running the code in a loop every 60 seconds. I want the file name to be different or numbered every time the loop runs. I have tried the below

import pandas as pd
import time
starttime = time.time()
i=1
while True:
     url = 'https://www.moneycontrol.com/india/indexfutures/nifty/9/2021-05-27/OPTIDX/CE/12800.00/true
     optionchain = pd.read_html(url,attrs = {'class' : 'tblopt'})
     chaindata = pd.DataFrame(optionchain[1])
     chaindata1 = chaindata.rename(columns={0:'LTPcall',1:'Net Change',2:'Volume',3:'Open 
     Interest',4:'Change In Open Int',5:'StrikePrice',6:'LTPput',7:'Net Change',8:'Volume',9:'Open 
     Interest',10:'Change In Open Int'})
     s = 'file'
     x = (s+str(i))
     chaindata1.to_csv(r'C:\Users\dell\Desktop\data\%x.csv')
     i=+1
     time.sleep(60.0 - ((time.time() - starttime) % 60.0))

by running this i get first file as file1 and then file(x) and it keeps overwritting file(x) i want it to flow file1,file2,file3 and so on


Solution

  • The i=+1 doesn't do anything, it just assigns +1 to i. Alson, you can use str.format to format the filename. For example:

    import pandas as pd
    import time
    
    starttime = time.time()
    i = 1
    while True:
        url = "https://www.moneycontrol.com/india/indexfutures/nifty/9/2021-05-27/OPTIDX/CE/12800.00/true"
        optionchain = pd.read_html(url, attrs={"class": "tblopt"})
        chaindata = pd.DataFrame(optionchain[1])
        chaindata1 = chaindata.rename(
            columns={
                0: "LTPcall",
                1: "Net Change",
                2: "Volume",
                3: "Open Interest",
                4: "Change In Open Int",
                5: "StrikePrice",
                6: "LTPput",
                7: "Net Change",
                8: "Volume",
                9: "Open Interest",
                10: "Change In Open Int",
            }
        )
        chaindata1.to_csv(
            r"C:\Users\dell\Desktop\data\file{}.csv".format(i)
        )  # <-- use str.format here
        i += 1  # <-- use i += 1 instead of i = +1
        time.sleep(60.0 - ((time.time() - starttime) % 60.0))