Search code examples
pythonpython-docx

Check if cell content contains specific shape


I have a table that contains some text with upper line, the upper line changes the meaning of the text, I want to be able to determine for each cell if it contains a line shape or not.

From what I saw, there's a cell.part.inline_shapes but it gives the same results for each cell in the table, and it the doesn't specify the actual shape (line/rectangular/square etc.).

e.g. in the following table, only cell [1, 0] is containing line

Example table

def is_line(shape):
    #TODO implement
    pass

def is_containing_line(cell):
    # TODO: check if shape is in current cell, as cell.part.inline_shapes are the same in every table cell    
    cell_shapes = cell.part.inline_shapes
    return any(is_line(shape) for shape in cell_shapes)


[i for i, cell in enumerate(table.columns[column_index].cells[starting_row:])
if is_containing_line(cell)]

print(cell._tc.xml) for cell that contains line shape:

<w:tc xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:ve="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
  xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml"
  xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main" xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture">
  <w:tcPr>
    <w:tcW w:w="1734" w:type="dxa" />
    <w:tcBorders>
      <w:top w:val="single" w:sz="12" w:space="0" w:color="000000" />
      <w:bottom w:val="nil" />
    </w:tcBorders>
  </w:tcPr>
  <w:p>
    <w:pPr>
      <w:pStyle w:val="TableParagraph" />
      <w:spacing w:before="10" />
      <w:rPr>
        <w:b/>
        <w:sz w:val="2" />
      </w:rPr>
    </w:pPr>
  </w:p>
  <w:p>
    <w:pPr>
      <w:pStyle w:val="TableParagraph" />
      <w:spacing w:line="20" w:lineRule="exact" w:before="0" />
      <w:ind w:left="755" />
      <w:rPr>
        <w:sz w:val="2" />
      </w:rPr>
    </w:pPr>
    <w:r>
      <w:rPr>
        <w:sz w:val="2" />
      </w:rPr>
      <w:pict>
        <v:group style="width:11.1pt;height:.6pt;mso-position-horizontal-relative:char;mso-position-vertical-relative:line" coordorigin="0,0" coordsize="222,12">
          <v:rect style="position:absolute;left:0;top:0;width:222;height:12" filled="true" fillcolor="#000000" stroked="false">
            <v:fill type="solid" />
          </v:rect>
        </v:group>
      </w:pict>
    </w:r>
    <w:r>
      <w:rPr>
        <w:sz w:val="2" />
      </w:rPr>
    </w:r>
  </w:p>
  <w:p>
    <w:pPr>
      <w:pStyle w:val="TableParagraph" />
      <w:spacing w:before="0" />
      <w:ind w:left="453" w:right="449" />
      <w:jc w:val="center" />
      <w:rPr>
        <w:sz w:val="16" />
      </w:rPr>
    </w:pPr>
    <w:r>
      <w:rPr>
        <w:sz w:val="16" />
      </w:rPr>
      <w:t>EN</w:t>
    </w:r>
  </w:p>
</w:tc>

print(cell._tc.xml) for cell that doesn't contain line shape:

<w:tc xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:ve="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
  xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml"
  xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main" xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture">
  <w:tcPr>
    <w:tcW w:w="1734" w:type="dxa" />
  </w:tcPr>
  <w:p>
    <w:pPr>
      <w:pStyle w:val="TableParagraph" />
      <w:spacing w:before="55" />
      <w:ind w:left="453" w:right="448" />
      <w:jc w:val="center" />
      <w:rPr>
        <w:sz w:val="16" />
      </w:rPr>
    </w:pPr>
    <w:r>
      <w:rPr>
        <w:sz w:val="16" />
      </w:rPr>
      <w:t>BOUT</w:t>
    </w:r>
  </w:p>
</w:tc>


Solution

  • There is no API support for this in python-docx.

    However, this function will tell you whether a drawing (inline-shape) is present in a paragraph. Note that depending on the Word version, such an item may appear as a <w:pict> (bitmap image) element instead of a <w:drawing> (vector art) element:

    def has_inline_shape(paragraph):
        """Return True if `paragraph` contains an inline shape."""
        return (
            bool(paragraph._p.xpath(".//w:drawing"))
            or bool(paragraph._p.xpath(".//w:pict"))
        )
    

    You can apply it to each paragraph in a cell to determine whether the cell contains such a shape:

    def cell_contains_inline_shape(cell):
        """Return True if an inline-shape appears in `cell`."""
        return any(
            has_inline_shape(paragraph)
            for paragraph in cell.paragraphs
        )