Python: reciept formatting

I want to make a python program, which is able to recieve the datas from the barcode reader and from the moneycounter. Finally print all datas in one reciept.Putting BRC datas is not a problem,but I dont know how to modify this reciept because the PCS. column and the AMOUNT column are one row lower than the CUR. and DENOMI column. Do you have any idea how to solve this problem or what is the easiest way? Here is a pictures how it should look like

              TRANSACTION NO. 7487215638 

CUR. DENOMI.                            

HUF 500         PCS.              AMOUNT

                 26              13,000 

       (          1) (              500)

HUF 1000                                

                 28              28,000 

HUF 2000                                

                 27              54,000 

       (          1) (            2,000)

HUF 5000                                

                 15              75,000 

HUF 10000                               

                 24             240,000 

HUF 20000                               

                 10             200,000 

----------------------------------------

HUF                                     

  TOTAL         130             610,000 

       (          2) (            2,500)

I dont have any idea what could be the method. Splitting the whole reciept, and put in the List? Or how should I start?

Solution

It's best to split and then process the text. You can also combine it with using a pattern to extract what you need and what you don't:

(?im)HUF\s+(\d+\s+)?([A-Z0-9,.]+)\s+([A-Z0-9.,]+)\s*\(?\s*([0-9,.]+)\s*\)?\s*\(?\s*([0-9,.]+)\s*\)?\s*(\(?\s*([0-9,.]+)\s*\)\s*)*

Code:

import re

s = """
TRANSACTION NO. 7487215638 

CUR. DENOMI.                            

HUF 500         PCS.              AMOUNT

                 26              13,000 

       (          1) (              500)

HUF 1000                                

                 28              28,000 

HUF 2000                                

                 27              54,000 

       (          1) (            2,000)

HUF 5000                                

                 15              75,000 

HUF 10000                               

                 24             240,000 

HUF 20000                               

                 10             200,000 

----------------------------------------

HUF                                     

  TOTAL         130             610,000 

       (          2) (            2,500)
"""

p = r'(?im)HUF\s+(?:(\d+)\s+)?([A-Z0-9,.]+)\s+([A-Z0-9.,]+)\s*\(?\s*([0-9,.]+)\s*\)?\s*\(?\s*([0-9,.]+)\s*\)?\s*(\(?\s*([0-9,.]+)\s*\)\s*)*'

f = re.findall(p, s)

print(f)

Prints

[('500', 'PCS.', 'AMOUNT', '26', '13,000', '( 500)\n\n', '500'), ('1000', '28', '28,0', '0', '0', '', ''), ('2000', '27', '54,000', '1', '2,000', '', ''), ('5000', '15', '75,0', '0', '0', '', ''), ('10000', '24', '240,0', '0', '0', '', ''), ('20000', '10', '200,0', '0', '0', '', ''), ('', 'TOTAL', '130', '610,000', '2', '(
2,500)\n', '2,500')]

Note:

You can write algorithms to process the text part by part or as a whole.