I'm working on a project using Camelot to read tables from PDFs and images. We need to find the boundary coordinates of the table cells.
Camelot shows core classes here, and I think the answer might be here, but I'm not seeing it. I see functions that require coordinates as parameters but not as output.
https://camelot-py.readthedocs.io/en/master/_modules/camelot/core.html
Anyway I need to find the list of each cell and its coordinates. How to do that?
You are interested in table.cells
An example of usage:
import camelot
tables=camelot.read_pdf('YOUR-PDF-FILEPATH',pages='all')
print(tables[0].cells)
output:
[[<Cell x1=218.65 y1=698.47 x2=267.14 y2=722.23>,
<Cell x1=267.14 y1=698.47 x2=296.18 y2=722.23>,
<Cell x1=296.18 y1=698.47 x2=324.98 y2=722.23>,
<Cell x1=324.98 y1=698.47 x2=353.78 y2=722.23>,
<Cell x1=353.78 y1=698.47 x2=382.83 y2=722.23>,
<Cell x1=382.83 y1=698.47 x2=411.63 y2=722.23>,
<Cell x1=411.63 y1=698.47 x2=440.43 y2=722.23>,
<Cell x1=440.43 y1=698.47 x2=469.23 y2=722.23>,
<Cell x1=469.23 y1=698.47 x2=500.91 y2=722.23>,
<Cell x1=500.91 y1=698.47 x2=529.96 y2=722.23>],...]
List of cell attributes (obtained by dir(tables[0].cells[0][0])
):
bottom, bound, hspan, lb, left, lt, rb, right, rt, text, top, vspan, x1, x2, y1, y2.
You can try them and play with them.