I am using the Google document AI API to OCR some documents but I have an issue.
OCR works well but I would like it to tell me the orientation of the document, so whether it is upside down or rotated left or right.
Since the OCR works, the process is capable of knowning this information. It is however not returned in the output JSON of the API. As far as I can see it should be in the page "orientation" property but I always get "orientation": "PAGE_UP"
.
For the incorrect (upside down) sample I'm also noticing a transforms property.
"transforms": [
{
"rows": 2,
"cols": 3,
"type": 6,
"data": "AAAAAAAA8L8HXBQzJqahvAAAAAAAgFlAB1wUMyamoTwAAAAAAADwv/7//////zZA"
}
]
But I don't know how to translate this into rotation information, it is not the same value for all upside down documents.
Request Info
POST https://eu-documentai.googleapis.com/v1/projects/xxxxx/locations/eu/processors/xxxxx:process
Processor: Document OCR, pretrained-ocr-v1.2-2022-11-10
Upright Sample Request
{
"rawDocument": {
"content": "iVBORw0KGgoAAAANSUhEUgAAAGYAAAAXCAYAAAD5oToGAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsMAAA7DAcdvqGQAAAKQSURBVGhD7ZlPsikxFIfjLcQAy2CKJWBkygKYGJqwAKZGWAKGWAZKKftw+8vN8XL76tbtuq9T9fJVpfokkpM/v85JuuRuAcrjHH/M0+MYXpgMyeVyxvqOF8ZRvDCO4oVxlLcIU6/X1Xg8Njk32O/3sTHcdRIJUyqV9CQfJRYgjtPppOstl0tT8hyEJqWBMabpIw3vevHwkXReiYQ5HA6Kzx355Nntdvd8uVzWZVEUCgVdr9FomJLnrFYrnf5rgkVLBU0CYUzuk1qtdhuNRrdisah/Jx2PR/Pr1zY8pY5dbtPpdHQSbL92uWD7Iy0Wi3s/2FIebsu45Tf6eITdN4l5gu2XJPPAJ0mgT/I87frSdxRvE4ZyEUMGI9htmCiTioP2sogshO0rirBfEUb8MDZ7HOEx2n2GoZ4IAuJb5it5kH7sl0MIz8X+LczbbmVBpzpsQavV0uEviu12a6xkrNdrY6VnMpnoJ2MLFkVdr1edn06najAYaBsYM2VJmM/nKhDxPl/CefBi6DOOskAU1Ww2Vbvd1vYr/Np1OXhzjPUVBNtsNvpCkOQg7PV6WnS5bHCZ+AmXy8VYSlUqlbtf7DQgorQl2fPlPEUosV/h14SJQy4TPJPcdhCH+ghUrVZN6c8JQo32a6eksGPCbUUE5sQtUexXyEQYQQaflHw+b6zv4MveDc9gYYfDocnFQ3g6n88m9zfsPdq9lPX7fR1CZ7OZtqUe448L8Tb/VBj5ppEE7IY4CHdSn7hNGHwE8ZxFoN6zbyuQs0d8k6JCqwhBHc4RzhTODsKV3Z75saPlvKUeL0C329V+2FGEPOpKWRT+/5gMQaCo5c80lHmi8cI4ihfGUbwwGRJ3vHthHMUL4yRKfQDryk980ii4pQAAAABJRU5ErkJggg==",
"mimeType": "image/png"
},
"processOptions": {
"ocrConfig": {
"enableImageQualityScores": true
}
}
}
Upside-down Sample Request
{
"rawDocument": {
"content": "iVBORw0KGgoAAAANSUhEUgAAAGYAAAAXCAYAAAD5oToGAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsQAAA7EAZUrDhsAAAKMSURBVGhD7ZlBkmkxFIaj16GUwjIwxBIwMmUBTAxNWABTIywBAxOWgYGNvM5/6v63Iu1eafqR1y9fVSrJkZyc5CQnNyXzR6MC3vER5QHPCI7xlETHZDKZqBR4B+HEeEpwjKcEx3hKqmN6vZ7cNeZ902g0RE4mk4kqlUrqfD7HbZlWq5W0QU7Z4XAQmQ30mn1RJxjDrAPqQmIfjmdj20ZdmIc5FxPbHujgWARlcz637LT7uJLqmOl0qvDMKRaL8aQhm81mMiCMHQwGarvdqkKhIG2ZTqeTarVa0q7ZbIqs2+2qxWIhemzW6/VVf5C0aCblclnaj8djNZ/PI+k1tVpNLZfLWDfGSgMLDNgeCfN7JU6hDCficrlIGQZiETqdjiwcyreMhsx2RD6fF2e6MBwOZQO4ksvl1PF4jGpfof2upOl6BQ/dMf1+X3IYz7IZrpi+s7B230qlEv3yPDjRONnQy9OQBuaEzYj2yN/BQ45hWEO4MuM6Qp55/HFivsN+v4/7okxwGp7BDLNwkItzGFoRBpOcg/n+LR5yDO4OxHPEbZQRnnCPwFG8DCHDiWm321K/hx32RqNR7NhsNqs2m00cBl3uniTq9XpUcqNarUalr8BhtJn37Y+hd8VN8JOehORMeqFEhpygjgT0Lr9qrx0ncn0PXcn1ThO5DeRsQ50EY/I36EWO8Vg2k43Z1xwbcnMuxJ4H6qac6I0Yt4FezNO22+7jSqpjCAbDoM9yy/BHgX1cMAAHJTn8VfykYx4KZb8FvDnsd0ca/MC592XJt4v5EcO668dE4v8xUJLwkxfAPr0b5R3jC/io2O12d99JLvyzjvnt/NehzGeCYzwlOMZTEh0T7pf3Ek6Mlyj1CUzhvEEr5fipAAAAAElFTkSuQmCC",
"mimeType": "image/png"
},
"processOptions": {
"ocrConfig": {
"enableImageQualityScores": true
}
}
}
Because my integration platform (in which I'm doing these developments) doesn't work well with Python, I ended up creating a simplified Java version of @Patrick Richter's answer. All credit to him.
public static double getRotationFromTransforms(String base64Data, int rows, int cols, int type) {
if(rows != 2 || cols != 3)
throw new IllegalArgumentException("Matrix other than 2 by 3 is not supported");
if(type != 6)
throw new IllegalArgumentException("Type other than 6 is not supported");
byte[] data = Base64.getDecoder().decode(base64Data);
double a = ByteBuffer.wrap(data, 0, 8).order(ByteOrder.LITTLE_ENDIAN).getDouble();
double c = ByteBuffer.wrap(data, 24, 8).order(ByteOrder.LITTLE_ENDIAN).getDouble();
double rotationRadians = Math.atan2(c, a);
double rotationDegrees = Math.toDegrees(rotationRadians);
return rotationDegrees;
}