I am new to python. I have an issue while converting PDf
file into CSV
format. I have used tabula
for converting my PDF file into CSV
. but, while converting PDF into CSV
I am facing the occurrence of empty lines in the CSV
file
sample pdf file to need to be converted sample pdf format
This is what i have tried,
pdf_path = "/home/niranjan/code/html_spikes/statewise/cin/pdfreader/Manipur_company_1.pdf"
doc = tabula.read_pdf(pdf_path,pages = 'all')
tabula.convert_into(pdf_path,"manipur.csv", output_format = "csv", pages = 'all')
print(doc)
This is the result looks like converted CSV format
The Result I was expecting Expected CSV output
the converted CSV file gives some cells as empty but I need perfect row order. I can't able to figure-out how to do it.
Anyone suggest better way to do it
I solved this.. Here is the code
for row in reader:
name = " "
if not row[0]:
name = row[1]
for row in reader:
full_name = name+ " " + row[1]
break
row[1] = full_name