The URL for Excel file is this: https://www.gso.gov.vn/wp-content/uploads/2024/03/IIP-ENG.xlsx
I have this code:
from datetime import datetime, timedelta
url = 'https://www.gso.gov.vn/wp-content/uploads/' + datetime.strftime(datetime.now() - timedelta(30), '%y') +'/' + datetime.strftime(datetime.now() - timedelta(30), '%m') + '/IIP-ENG.xlsx'
import requests
resp = requests.get(url, verify=False)
output = open('IIP.xlsx', 'wb')
output.write(resp.content)
output.close()
I can see a file being downloaded but I can't open it in Office Excel. The file is corrupted.
resp
<Response [404]>
I also cant open using this code:
import pandas as pd
df = pd.read_excel(open('IIP.xlsx', 'rb'),sheet_name=0, engine='openpyxl')
print(df.head(5))
BadZipFile error. The file is not a Zip file.
How to fix this ?
The issue is with the year format, '%y'
will give 24, you need '%Y'
for 2024
datetime.strftime(datetime.now() - timedelta(30), '%Y')