Search code examples
adobe-indesignidml

IDML : Extract text content in proper order


I am trying to extract the text content from IDML files.

What i am currently doing is -

  1. Extract the xml files, go to designmap.xml file and look for the spreads that make up the document.

  2. Spreads are elements in designmap.xml, defined as ::

  3. In each spread, I look for <TextFrame> elements, and fetch the corresponding content from the ParentStory attribute

The problem is that this text does not seem to be in order. I have a simple IDML file, where i have one text frame for the title, and one textframe covers the contents of the page. When I extract, the body part ifs fetched first, and then the header.

Is there any way in which I can extract the content in the same order in which we can see it?

Thanks.

PS - In the element, the NextFrame and PreviousFrame attributes are both set to 'n'. I'm not sure what that means, and can these values somehow help. Apologies if I'm missing something very basic here, i'm new to in-design and IDML.


Solution

  • The order of TextFrame elements in an IDML Spread indicates their z-order depth, not any kind of reading order on the page. In the document you describe, either the depth was manipulated, or the body element was added to the document before the header: either way it is at a lower depth.

    The only way to determine reading order in the way I think you want is to figure out the position of elements on the page (presumably once you know this you can work from top to bottom and/or left to right, or even right to left depending on the language). This can be a bit tricky, but is basically the sum of the GeometricBounds and ItemTransform parameters of the Spread > Page > PageItem heirarchy. See my answer here for more detail: https://stackoverflow.com/a/12490600/1014822

    Alternatively, if you have control over the document authoring process, you could ensure authors use depth to indicate reading order, which will save you a bit of coding. But note that IDML has a concept of Layers as well, which further complicates the depth issue.

    NextTextFrame and PreviousTextFrame are only used for linked frames, when a story flows from one frame to another. A value of N indicates there is no linked frame in that direction.