Search code examples
androidkotlinfirebase-mlkitgoogle-mlkit

ML Kit text recognition: How to get text based on position in image?


I'm making an app for myself to scan receipts, and put the data in a database. I tried using ML kit text recognition, and it works pretty well. However I'm having problems extracting the data from the recognized text. I'll explain with an example:

This is the format of the receipt, which is how I want to get the data:

+--------+--------+-------+-------+
| Amount |  Name  | Price | Total |
+--------+--------+-------+-------+
|      1 | Cheese |       | 1,15  |
|      1 | Eggs   |       | 2,59  |
|      2 | Milk   | 0,99  | 1,98  |
|      1 | Butter |       | 0,80  |
+--------+--------+-------+-------+

However when running the text recognition, it formats the data in really weird ways. For example the above receipt would give these blocks:

Amount
Price
Name Cheese Eggs Milk 0,99
Butter
Total 1,15 2,59
1,98 0,80

It seems to skip the single numbers in the amount column, but I can work around that. However I can't figure out how to parse above data into the data I want, especially connecting the prices to the names. Is there a way to change the blocks so that it takes only the rows or columns of the receipt, instead of this randomness?

Edit: when using lines or elements instead of blocks, I get the following result:

Amount
Price
Name
Cheese
Eggs
Milk
0,99
Butter
Total
1,15
2,59
1,98
0,80

However I still have the same problem: how do I pair the items with the correct prices?


Solution

  • Try using lines or elements instead of blocks in that case (https://developers.google.com/ml-kit/vision/text-recognition/android#4.-extract-text-from-blocks-of-recognized-text) and then use the position of those blocks to reconstruct the table.