Search code examples
python-3.xpython-re

Regex Error Finding Details from a bank statement


I am working with Regex and currently I am trying to extract the Name, IFSC and Account No. from the PDF. I am using following code to extract the details.

acc_name= " ", '\n'.join([re.sub(r'^[\d \t]+|[\d \t]+:$', '', line) for line in data.splitlines() if 'Mr. ' in line])
acc_no= " ", '\n'.join([re.sub(r'Account Number\s+:', '', line) for line in data.splitlines() if 'Account Number' in line])
acc_code = " ", '\n'.join([re.sub(r'IFSC Code\s+:', '', line) for line in data.splitlines() if 'IFSC Code' in line])

But the data which I am getting back is following:

(' ', ' 50439602642')
(' ', 'Mr. MOHD AZFAR ALAM LARI')
(' ', ' ALLA0211993')

I want to remove the commas, brackets and quotes. I am new with regex so any help would be appreciated.


Solution

  • You're creating a tuple:

    >>> " ", "\n'
    (" ", "\n')
    >>>
    

    As you can see, a tuple is created, so either you mean by:

    acc_name= ' \n'.join([re.sub(r'^[\d \t]+|[\d \t]+:$', '', line) for line in data.splitlines() if 'Mr. ' in line])
    acc_no= ' \n'.join([re.sub(r'Account Number\s+:', '', line) for line in data.splitlines() if 'Account Number' in line])
    acc_code = ' \n'.join([re.sub(r'IFSC Code\s+:', '', line) for line in data.splitlines() if 'IFSC Code' in line])
    

    Or just a space:

    acc_name= ' '.join([re.sub(r'^[\d \t]+|[\d \t]+:$', '', line) for line in data.splitlines() if 'Mr. ' in line])
    acc_no= ' '.join([re.sub(r'Account Number\s+:', '', line) for line in data.splitlines() if 'Account Number' in line])
    acc_code = ' '.join([re.sub(r'IFSC Code\s+:', '', line) for line in data.splitlines() if 'IFSC Code' in line])