Search code examples
pythonpandascsvexport-to-csvtabula-py

Occurring empty lines in the CSV file while converting PDF document to CSV


I am new to python. I have an issue while converting PDf file into CSV format. I have used tabula for converting my PDF file into CSV. but, while converting PDF into CSV I am facing the occurrence of empty lines in the CSV file

sample pdf file to need to be converted sample pdf format

This is what i have tried,

pdf_path = "/home/niranjan/code/html_spikes/statewise/cin/pdfreader/Manipur_company_1.pdf"

doc = tabula.read_pdf(pdf_path,pages = 'all')
tabula.convert_into(pdf_path,"manipur.csv", output_format = "csv", pages = 'all')
print(doc)

This is the result looks like converted CSV format

The Result I was expecting Expected CSV output

the converted CSV file gives some cells as empty but I need perfect row order. I can't able to figure-out how to do it.

Anyone suggest better way to do it


Solution

  • I solved this.. Here is the code

    for row in reader:
        name = " "
        if not row[0]:
           name = row[1]
           for row in reader:
               full_name = name+ " " + row[1] 
               break
           row[1] = full_name