I have a list that is 5 rows by 5 columns.
I am trying to convert this list into a dataframe.
When I try to do so, it only grabs the first row.
This failed because I had it set to 5,5:
df2 = pd.DataFrame(np.array(pdf_read).reshape(5,5),columns=list("abcde"))
When I switched it to this:
df2 = pd.DataFrame(np.array(pdf_read).reshape(1,5),columns=list("abcde"))
It only grabbed the first row.
Edit: Added Context
I am using the tabula
module in python to read a PDF file.
The PDF file results are stored in the variable pdf_read
.
When I do len(pdf_read)
it has a length of 1, but when I type
print(pdf_read)
it says it is 5 rows x 5 columns, which is very strange.
Edit #2: Datatypes
I ran the following:
print(type(pdf_read))
print(type(pdf_read[0]))
I got <class 'list'>
and <class 'pandas.core.frame.DataFrame'>
respectively.
It seems I have a Dataframe inside of a list.
I ran this code:
df = pd.DataFrame(
pdf_read[0],columns=["column_a","column_b","column_c","column_d","column_e"]
)
This just returns a 5,5 dataframe, but all of the values in each column are NaN.
Some progress made, but will need to figure out why the values are not populated now.
EDIT: After some research output pdf_read
is list of DataFrames.
So for first DataFrame
:
df = pdf_read[0]