Search code examples
python-3.xpdfpython-camelot

how to get list of table objects (cells) from Camelot


I'm working on a project using Camelot to read tables from PDFs and images. We need to find the boundary coordinates of the table cells.

Camelot shows core classes here, and I think the answer might be here, but I'm not seeing it. I see functions that require coordinates as parameters but not as output.

https://camelot-py.readthedocs.io/en/master/_modules/camelot/core.html

Anyway I need to find the list of each cell and its coordinates. How to do that?


Solution

  • You are interested in table.cells

    An example of usage:

    import camelot
    tables=camelot.read_pdf('YOUR-PDF-FILEPATH',pages='all')
    print(tables[0].cells)
    

    output:

    [[<Cell x1=218.65 y1=698.47 x2=267.14 y2=722.23>,
      <Cell x1=267.14 y1=698.47 x2=296.18 y2=722.23>,
      <Cell x1=296.18 y1=698.47 x2=324.98 y2=722.23>,
      <Cell x1=324.98 y1=698.47 x2=353.78 y2=722.23>,
      <Cell x1=353.78 y1=698.47 x2=382.83 y2=722.23>,
      <Cell x1=382.83 y1=698.47 x2=411.63 y2=722.23>,
      <Cell x1=411.63 y1=698.47 x2=440.43 y2=722.23>,
      <Cell x1=440.43 y1=698.47 x2=469.23 y2=722.23>,
      <Cell x1=469.23 y1=698.47 x2=500.91 y2=722.23>,
      <Cell x1=500.91 y1=698.47 x2=529.96 y2=722.23>],...]
    

    List of cell attributes (obtained by dir(tables[0].cells[0][0])): bottom, bound, hspan, lb, left, lt, rb, right, rt, text, top, vspan, x1, x2, y1, y2.

    You can try them and play with them.