Search code examples
pythonpywin32visio

How to extract text from shape in visio with pywin32?


I'm parsing many visio files to extract texts write on specific shapes.

I use the script from this page. I just added some lines to extract informations :

for shp in shps:
    if shp.Type == 2:
        parse_names(shp, 2)
        print(shp.Name+" is group, which contain shapes:")
        print("-----------------------------------------")       
        for sshp in shp.Shapes:

            sshpListName.append(sshp.Name)
            sshpListText.append(sshp.Text)
            sshpListShape.append(sshp.Shapes)
            sshpListType.append(sshp.Type)

            parse_names(sshp, 1)
        print("-----------------------------------------")
    else:
        parse_names(shp, 0)

I obtain this :

My Shape.62
My Shape.62 is group, which contain shapes:
-----------------------------------------
    Sheet_green_frame.5
    Sheet_frameID.9 
    Sheet_funID.6 
-----------------------------------------

My shapes are green rectangles with only one text zone in the bottom left corner and that is the text I want. When I print the lists I obtain :

Names = ['Sheet_green_frame.5', 'Sheet_frameID.9', 'Sheet_funID.6']
Text = ['', '', '']
Shape = [<win32com.gen_py.Microsoft Visio 16.0 Type Library.IVShapes instance at 0x1552579577936>, <win32com.gen_py.Microsoft Visio 16.0 Type Library.IVShapes instance at 0x1552579687424>, <win32com.gen_py.Microsoft Visio 16.0 Type Library.IVShapes instance at 0x1552579577024>]
Type = [3, 3, 3]

When I open the file in Visio I can see the text in the data. So I don't understand where these data could be in my text in pywin32 because with other shapes I have some informations

P.S. : The  is just shown on the stack overflow post not on my prompt it's a whitespace ['', ' ', ' '].


Solution

  • The [obj] thing means that your shapes are using text fields to display the text. To get the visible (displayed) text in this case, instead of sshpListText.append(sshp.Text) you can use:

    sshpListText.append(sshp.Characters.TextAsString)

    More about text fields here