Search code examples
pythoncsvpython-itertools

Iterating over specific csv rows in python outputs a blank file


python newb here - I'm trying to format a set of really gross csv's I was sent so that I can throw them into a nice postgres table for querying and analysis. In order to do this, I first cleanse them using csv.writer to remove the blank rows and double quotes that wrap each entry. Here's what my code looks like:

import os
import csv
import glob
from itertools import islice

files = glob.glob('/Users/foo/bar/*.csv')

# Loop through all of the csv's  
for file in files:
    # Get the filename from the path
    outfile = os.path.basename(file)

    with open(file, 'rb') as inp, open('/Users/foo/baz/' + outfile, 'wb') as out:

        reader = csv.reader(inp)
        writer = csv.writer(out)
        for row in reader:
            if row:
                writer.writerow(row)
        out.close() 

It works perfectly fine, and does exactly what I want it to do. The output csv looks great. Next, I try to essentially chop off a certain amount of rows that contain completely unnecessary junk from both the beginning and end of the newly cleansed csv files (omit the first 8 rows and the last 2). For a reason that I truly cannot ascertain, the csv's output from this portion of the code (indented the same as the earlier 'with' block) are completely empty:

with open('/Users/foo/baz/' + outfile, 'rb') as inp2, open('/Users/foo/qux/' + outfile, 'wb') as out2:
    writer2 = csv.writer(out2)
    reader2 = csv.reader(inp2)
    row_count = sum(1 for row in reader2)
    last_line_index = row_count - 3 
    for row in islice(reader2, 7, last_line_index):
            writer2.writerow(row)
    out2.close()

I know that because of my 'with' usage, the close() at the end of each block is redundant - I tried it as an approach after looking here. I also tried putting the second 'with' block into a different file and running that after running the first 'with' block, but still to no avail. Your help is greatly appreciated!

Also, here's the whole file:

import os
import csv
import glob
from itertools import islice

files = glob.glob('/Users/foo/bar/*.csv')

# Loop through all of the csv's  
for file in files:
    # Get the filename from the path
    outfile = os.path.basename(file)

    with open(file, 'rb') as inp, open('/Users/foo/baz/' + outfile, 'wb') as out:

        reader = csv.reader(inp)
        writer = csv.writer(out)
        for row in reader:
            if row:
                writer.writerow(row)
        out.close() 

    with open('/Users/foo/baz/' + outfile, 'rb') as inp2, open('/Users/foo/qux/' + outfile, 'wb') as out2:
        writer2 = csv.writer(out2)
        reader2 = csv.reader(inp2)
        row_count = sum(1 for row in reader2)
        last_line_index = row_count - 3 
        for row in islice(reader2, 7, last_line_index):
                writer2.writerow(row)
        out2.close()

Thanks!


Solution

  • The guilty party is

    row_count = sum(1 for row in reader2)
    

    it reads all the data from reader2; now when you try to for row in islice(reader2, 7, last_line_index) you don't get any data.

    Also, you are probably reading a lot of blank rows because you open the file as binary; instead do

    with open('file.csv', newline='') as inf:
        rd = csv.reader(inf)