Search code examples
pythonpython-3.xpandastabula-py

Why is the data in the PDF written in the 1st column?


I have a pdf file called Question.pdf, and its content is as follows.

Question.pdf

I am converting my pdf file to an xlsx file using the python tabula module. However, it writes all the data in the 1st column of my excel file, how can I delete this field? (the part indicated in the red area)

data.xlsx

import tabula
df = tabula.read_pdf('Question.pdf', pages=1, lattice=True)[1]

df.columns = df.columns.str.replace('\r', ' ')
data = df.dropna()
data.to_excel('data.xlsx', index=False)

Solution

  • Try this while exporting;

    data.to_excel('data.xlsx', index=False, header=None)
    

    Hope this Helps...