Search code examples
pythonregexcsvfindall

Python regex findall to read line in .csv file


I have a .csv file (or could happily be a .txt file) with some records in it:

JB74XYZ Kerry   Katona  44  Mansion_House   LV10YFB
WL67IAM William Iam 34  The_Voice_Street    LN44HJU

etc etc

I have used python to open and read the file, then regex findall (and attempted a similar regex rule) to identify a match:

import re
from re import findall

reg = "JB74XYZ"

with open("RegDD.txt","r")as file:
    data=file.read()
    search=findall(reg,data)

print (search)

which gives the resulting output:

['JB74XYZ']

I have tested this out, and it seems I have the regex findall working, in that it is correctly identifying a 'match' and returning it.

  1. My question is, how do I get the remaining content of the 'matched' lines to be returned as well? (eventually I will get this written into a new file, but for now I just want to have the matching line printed).

I have explored python dictionaries as one way of indexing things, but I hit a wall and got no further than the regex returning a positive result.

  1. I guess from this a second question might be: am I choosing the wrong approach altogether?

I hope I have been specific enough, first question here, and I have spent hours (not minutes) looking for specific solutions, and trying out a few ideas. I'm guessing that this is not an especially tricky concept, but I could do with a few hints if possible.


Solution

  • A better way to handle this would be to use Python's csv module. From the looks of your CSV, I'm guessing it's tab-delimited so I'm running off of that assumption.

    import csv
    
    match = "JB74XYZ"
    
    matched_row = None
    with open("RegDD.txt", "r") as file:
        # Read file as a CSV delimited by tabs.
        reader = csv.reader(file, delimiter='\t')
        for row in reader:
            # Check the first (0-th) column.
            if row[0] == match:
                # Found the row we were looking for.
                matched_row = row
                break
    
    print(matched_row)
    

    This should then output the following from matched_row:

    ['JB74XYZ', 'Kerry', 'Katona', '44', 'Mansion_House', 'LV10YFB']