I have a bunch of CSV files that I have edited and gotten rid of all of the lines that have 'DIF' in them. The problem that I realized later is that the count number in the file stays the same as before. Here is an example of the CSV before I edit it.
Name bunch of stuff
header stuff stuff
header stuff stuff
header stuff stuff
header stuff stuff
header stuff stuff
Count 11
NUMBER,ITEM
N1,Shoe
N2,Heel
N3,Tee
N4,Polo
N5,Sneaker
N6,DIF
N7,DIF
N8,DIF
N9,DIF
N10,Heel
N11,Tee
This is how the output CSV looks. I want the number next to 'Count' to equal the number now in the 'ITEMS' column as well as have everything in the 'NUMBER' column to be sequential.
Name bunch of stuff
header stuff stuff
header stuff stuff
header stuff stuff
header stuff stuff
header stuff stuff
Count 11
NUMBER,ITEM
N1,Shoe
N2,Heel
N3,Tee
N4,Polo
N5,Sneaker
N10,Heel
N11,Tee
Here is my current code that does that. It does what I want it to, but it screws up the rest of the CSV like I mentioned above.
import csv
import glob
import os
fns = glob.glob('*.csv') #goes through every CSV file in directory
for fn in fns:
reader=csv.reader(open(fn,"rb"))
with open (os.path.join('out', fn), 'wb') as f:
w = csv.writer(f)
for row in reader:
if not ' DIF' in row: #remove DIF
w.writerow(row)
I've tried a few small things to fix it, but I am fairly new to programming and nothing I try seems to do much. Any help would be appreciated.
Thank You
If you need to update the count, then you have to read twice and count the number of rows you are keeping first. You can keep a separate counter to rewrite the first column once you are writing the matched lines:
import re
numbered = re.compile(r'N\d+').match
for fn in fns:
# open for counting
reader = csv.reader(open(fn,"rb"))
count = sum(1 for row in reader if row and not any(r.strip() == 'DIF' for r in row) and numbered(row[0]))
# reopen for filtering
reader = csv.reader(open(fn,"rb"))
with open (os.path.join('out', fn), 'wb') as f:
counter = 0
w = csv.writer(f)
for row in reader:
if row and 'Count' in row[0].strip():
row = ['Count', count]
if row and not any(r.strip() == 'DIF' for r in row): #remove DIF
if numbered(row[0]):
counter += 1
row[0] = 'N%d' % counter
w.writerow(row)