Search code examples
vbaexcelcharts

Match labels to arrows in Excel flowchart using VBA


I'm writing a code generation tool using VBA in Excel (don't ask why—long story). I need to be able to "parse" a flowchart.

The problem is that Excel allows shapes to contain text, with the exception of connectors: lines and arrows can't contain text. To label an arrow, you just put a text box on top of it—but the box isn't "attached" to the arrow in a way that VBA can easily capture.

For example, a user might draw something like this:

example flowchart: three boxes, two arrows, two more boxes as arrow labels

Within my VBA code, I can use ActiveSheet.Shapes to find that the flowchart contains seven shapes: there are five boxes (the two labels are just boxes with no border) and two arrows. Then Shape.TextFrame2 will tell me what's written inside each box, and Shape.ConnectorFormat will tell me which box goes at the start and end of each arrow.

What I need is code that can deduce:

  • Label A belongs to the arrow from Box 1 to Box 2
  • Label B belongs to the arrow from Box 1 to Box 3

I can think of three ways of doing this, none of them satisfactory.

  1. Ask the user to group each label with its corresponding arrow.

  2. Find out the coordinates of the endpoints of each arrow, then calculate which arrows pass through which labels.

  3. Find out the coordinates of the corners of each box, then calculate which labels lie between which pairs of boxes.

Method 1 makes things easier for the programmer but harder for the user. It opens up a lot of potential for user error. I don't see this as an acceptable solution.

Method 2 would be reasonably easy to implement, except that I don't know how to find out the coordinates!

Method 3 is doable (Shape.Left etc will give the coordinates) but computationally quite messy. It also has potential for ambiguity (depending on placement, the same label may be associated with more than one arrow).

Note that methods 2 and 3 both involve trying to match every label with every arrow: the complexity is quadratic. Typical applications will have 10–50 arrows, so this approach is feasible, if somewhat inelegant.

Does anyone have a better idea? Ideally it would be something that doesn't involve coordinate geometry and complicated logic, and doesn't involve asking users to change the way they draw flowcharts.


Edited to add: example 2 in response to Tim Williams

flowchart with one box, two arrows, and a label whose bounding box intersects the bounding boxes of both arrows

Here's a label whose bounding box intersects the bounding box of both arrows, and whose midpoint isn't inside the bounding box of either arrow. Visually it's easy for a human to see that it belongs with the left arrow, but programmatically it's hard to deal with. If I can find out the coordinates of the arrows' endpoints, then I can calculate that one arrow passes through the label's box but the other doesn't. But if all I have is the bounding rectangles of the arrows, then it doesn't work.


Solution

  • You can find the coordinates of the arrow's endpoints as follows.

    First of all, the .Left, .Top, .Width and .Height properties describe the bounding rectangle of the arrow, as Tim Williams points out.

    Next, check the .HorizontalFlip and .VerticalFlip properties. If both are false, then the arrow runs from top left to bottom right in its bounding rectangle. That is, the beginning of the arrow has coordinates (.Left,.Top) and the end has coordinates (.Left+.Width,.Top+.Height).

    If either *.Flip is true, then the coordinates need to be swapped around as appropriate. E.g., if .HorizontalFlip is true but .VerticalFlip false, then the arrow runs from (.Left+.Width,.Top) to (.Left,.Top+.Height).

    As far as I can tell, this is not documented anywhere on MSDN. Thanks to Andy Pope for mentioning it at excelforums.com.

    Given this, method 2 seems like the best approach.