I have made this PDF scraping tool which I can run in Juypter notebook fine, however when I move it to IDLE I get the error code at the bottom. There are no key errors so I'm not sure why the result isn't printing!
Am new to this so any help much appreciated.
# In[46]:
import tabula
import pandas as pd
# In[52]:
URL = "http://ir.eia.gov/wpsr/overview.pdf"
table = tabula.read_pdf(URL,pages=1)
df = table[0]
df
# In[102]:
i = 4
result = []
while i < 23:
phrase1 = df.iloc[i][2]
phrase2 = df.iloc[i][3]
dot = phrase1.find(".")
plus = int(dot) + 2
left = phrase1[:dot]
right = phrase1[dot:int(plus)]
new_crude = (left + right)
dot2 = phrase2.find(".")
plus2 = int(dot) + 2
left2 = phrase2[:dot2]
right2 = phrase2[dot2:int(plus2)]
old_crude = (left2 + right2)
num = float(new_crude.replace(',','')) - float(old_crude.replace(',',''))
result.append(round(num,2))
i = i+1
result
# In[100]:
products = ["EIA (OCAL):",
"CRUDE:",
"GASOLINE:",
"DISTILLATE FUEL OIL:",
"PROPANE:"]
products
# In[109]:
print(products[0])
print(products[1]+str(result[1])+"m")
print(products[2]+str(result[3])+"m")
print(products[3]+str(result[9])+"m")
print(products[4]+str(result[14])+"m")
# In[ ]:
ERROR:
Traceback (most recent call last):
File "C:\Users\cbullock\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\indexes\base.py", line 2895, in get_loc
return self._engine.get_loc(casted_key)
File "pandas\_libs\index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 0
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\cbullock\AppData\Local\Programs\Python\Python39\EIA weekly - Automated (CB and EM).py", line 22, in <module>
df = table[0]
File "C:\Users\cbullock\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\frame.py", line 2906, in __getitem__
indexer = self.columns.get_loc(key)
File "C:\Users\cbullock\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\indexes\base.py", line 2897, in get_loc
raise KeyError(key) from err
KeyError: 0
It is because you installed the wrong package with pip (with Jupyter it automatically installed the right one).
You need to install tabula-py, not tabula
pip install tabula-py
is the correct way.
You should not to do
pip install tabula