Search code examples
pythoncsvfindall

Using re.findall when reading CSV


I am trying to read in a CSV file, and get a specific portion using re.findall.

Here is an example of the first few lines of my CSV file

School: Johnson County Elementary School | Student First Name: John | Student Last Name: Doe, 1, Please leave yearbook with sister in office
School: Kirkwood Elementary School | Student First Name: Karen | Student Last Name: Rodgers, 3, Null
School: 2nd Street Elementary School | Student First Name: Joe | Student Last Name: Greene, 12, Give to mom at pickup

Here is the code I am using

import csv
import re

def fileReader():
while True:
    input_file = input('What file would you like to read from? (or stop) ')
    if input_file.upper() == 'STOP':
        break
    schools = input('What school would you like to generate reports for? ')
    file_contents = open(input_file, newline='', encoding='utf-8')
    for row in csv.reader(file_contents):
        schoolName = re.findall('(?<=Student First Name: ).+?(?= |)',row[0], re.DOTALL)
        print(schoolName)


fileReader()

And when I run this code, The output is the first character of the school name like this:

['J']
['K']
['2']

Instead I want the whole school name like:

['Johnson County Elementary School']
['Kirkwood Elementary School']
['2nd Street Elementary School']

I am really confused why the re.finall is returning the first letter and not the whole school name.


Solution

  • First, look for School not Student First Name 😀

    Then, | is special to regular expressions as the OR operator and must be escaped as \| to find it literally:

    schoolName = re.findall('(?<=School: ).+?(?= \|)',row[0], re.DOTALL)
    

    You don't really need the csv module or lookahead/lookbehind to find the schools:

    import re
    
    with open('input.csv') as file:
        for row in file:
            schoolName = re.search('School: (.+?) \|',row).group(1)
            print(schoolName)