Search code examples
pythonstringcsvrow

Remove rows from CSV file containing certain characters


I am looking to remove rows from a csv file if they contain specific strings or in their row.

I'd like to be able to create a new output file versus overwriting the original.

I need to remove any rows that contain "py-board" or "coffee"

Example Input:

173.20.1.1,2-base
174.28.2.2,2-game
174.27.3.109,xyz-b13-coffee-2
174.28.32.8,2-play
175.31.4.4,xyz-102-o1-py-board
176.32.3.129,xyz-b2-coffee-1
177.18.2.8,six-jump-walk

Expected Output:

173.20.1.1,2-base
174.28.2.2,2-game
174.28.32.8,2-play
177.18.2.8,six-jump-walk

I tried this Deleting rows with Python in a CSV file

import csv
with open('input_csv_file.csv', 'rb') as inp, open('purged_csv_file', 'wb') as out:
    writer = csv.writer(out)
    for row in csv.reader(inp):
        if row[1] != "py-board" or if row[1] != "coffee":
            writer.writerow(row)

and I tried this

import csv
with open('input_csv_file.csv', 'rb') as inp, open('purged_csv_file', 'wb') as out:
    writer = csv.writer(out)
    for row in csv.reader(inp):
        if row[1] != "py-board":
            if row[1] != "coffee":
                writer.writerow(row)

and this

        if row[1][-8:] != "py-board":
            if row[1][-8:] != "coffee-1":
                if row[1][-8:] != "coffee-2":

but got this error

  File "C:\testing\syslogyamlclean.py", line 6, in <module>
    for row in csv.reader(inp):
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)

Solution

  • I would actually not use the csv package for this goal. This can be achieved easily using standard file reading and writing.

    Try this code (I have written some comments to make it self-explanatory):

    # We open the source file and get its lines
    with open('input_csv_file.csv', 'r') as inp:
        lines = inp.readlines()
    
    # We open the target file in write-mode
    with open('purged_csv_file.csv', 'w') as out:
        # We go line by line writing in the target file
        # if the original line does not include the
        # strings 'py-board' or 'coffee'
        for line in lines:
            if not 'py-board' in line and not 'coffee' in line:
                out.write(line)