Search code examples
pythondataframehtml-tablespreadsheetexport-to-excel

Create and import only last sheet data with found error "UserWarning: Calling close() on already closed file."


When I run blow code I got error : UserWarning: Calling close() on already closed file. warn("Calling close() on already closed file.")

I cannot save my first three code and excel sheet also.

import openpyxl
from os import path
import pandas as pd

def load_workbook(wb_path):
    if path.exists(wb_path):
        return openpyxl.load_workbook(wb_path)
    return openpyxl.Workbook()

wb_path = './output/trail.xlsx'
wb = load_workbook(wb_path)

list = [530001, 530011, 530017, 530023]

for item in list:
    url_1 = f'https://www.screener.in/company/{item}/'

    df0 = pd.read_html(url_1)
        
    data1 = df0[1]
    
    with pd.ExcelWriter(wb_path) as writer:

        data1.to_excel(writer,sheet_name=str(item),startcol=0 ,startrow=1,index=False)
        
    writer.save()

Solution

  • First things first: please don't call a list "list". list is a built-in data type in Python, which your code is overwriting. E.g.:

    print(type(list))
    <class 'type'>
    
    list=[1,2,3]
    
    print(type(list))
    <class 'list'>
    

    Your error is related to this info from the pd.ExcelWriter documentation:

    The writer should be used as a context manager. Otherwise, call close() to save and close any opened file handles.

    Since you are indeed using pd.ExcelWriter as a context manager (through the with statement), the file will already be saved and closed, when you get to the line writer.save(). We don't need it.

    The next problem is that pd.ExcelWriter is used with the default mode, namely "w" (write). In this mode you keep overwriting the same (first) sheet. This is the reason why you end up with only the data from the last item. To avoid this, use mode="a" (append). Probably also wise to specify if_sheet_exists="replace" (or whatever you prefer; have a look at the options). Let's also specify the engine: engine="openpyxl".

    Finally, your current code will open/save/close the file 4 times, namely during each loop. This is unnecessary. To avoid this, simply place the for loop within the with statement. So, putting that all together:

    lst = [530001, 530011, 530017, 530023]
        
    with pd.ExcelWriter(wb_path, mode="a", if_sheet_exists="replace", engine="openpyxl") as writer:
        
        for item in lst:
            url_1 = f'https://www.screener.in/company/{item}/'
        
            df0 = pd.read_html(url_1)
                
            data1 = df0[1]
        
            data1.to_excel(writer, sheet_name=str(item),startcol=0 ,startrow=1,index=False)
    

    Result:

    xslx-writer