I'm trying to read a pdf file where each page is divided into 3x3 blocks of information of the form
A | B | C
D | E | F
G | H | I
Each of the entries is broken into multiple lines. A simplified example of one entry is this card. But then there would be similar cards in the other 8 slots.
I'd like to be able to read A, then B, then C…; however, I could survive if I read the first line of the A, B, and C, and then the second line of A, B, and C, etc. I've looked at pdfminer and pypdf, but I haven't seen anything to fit what I'm looking for. The answer here works fairly well, but the order of
columns routinely gets distorted.
In the second answer here replace
self.rows = sorted(self.rows, key = lambda x: (x[0], -x[2]))
by
self.rows = sorted(self.rows, key = lambda x: (x[0], -x[2], x[1]))
Very important: See the last paragraph of this answer.