I've been struggling with this matter for 2 full days now due to my incompetence. After trying almost all stackoverflow and other solutions I could find sadly still no luck.
I'm using Tabular-Py to import tables from PDFs. After which it's already "perfectly" in what seems to be a dataframe. The part of the code used for this is:
tables = tabula.read_pdf(file, pages=18, lattice=True, multiple_tables = False)
Print(Tables)
[Output after printing the table] [1]: https://i.sstatic.net/82Qpa.png
However, it seems to be a list object, as it's blocking me from doing anything else with it besides printing. Even using integers and renaming columns doesn't work due to the errors leading back to "Cannot XX because it's a list object". I was under the impression Tabular makes a direct Pandas Dataframe.
Now when I try to add the following code to rename the columns as desired:
tables.columns = ['HS_Code', 'Product', 'PreviousMonth', 'CurrentMonth', 'LastYear']
I get the error :
AttributeError: 'list' object has no attribute 'columns'
I've tried many forms of renaming and using different sets of output such as Json. Still no luck, it's still a "list object".
Does anyone have experience with this matter? How can I ensure the Table/Dataframe I have is an actual dataframe instead of a list object?
Any tips would be highly appreciated.
I am not familiar with tabula-py objects but considering this post you can do the following:
pandas.read_clipboard()
after copying the pdf content by hand
or 2. save the tabula-py object as csv and use pandas.read_csv()
to get the DataFrameAfterwards you are able to manipulate the data (e.g. change column names) using pandas.