I have a directory that contains a lot of .csv
files, and I am trying to write a script that runs on all the files in the directory while doing the following operation:
Remove the first and last lines from all the csv files
I am running the following code:
import glob
list_of_files = glob.glob('path/to/directory/*.csv')
for file_name in list_of_files:
fi = open(file_name, 'r')
fo = open(file_name.replace('csv', 'out'), 'w') #make new output file for each file
num_of_lines = file_name.read().count('\n')
file_name.seek(0)
i = 0
for line in fi:
if i != 1 and i != num_of_lines-1:
fo.write(line)
fi.close()
fo.close()
And I run the script using python3 script.py
. Though I don't get any error, I don't get any output file either.
There are multiple issues in your code. First of all you count the number of lines on the filename instead of the file-object. The second problem is that you initialize i=0
and compare against it but it never changes.
Personally I would just convert the file to a list of "lines", cut off the first and last and write all of them to the new file:
import glob
list_of_files = glob.glob('path/to/directory/*.csv')
for file_name in list_of_files:
with open(file_name, 'r') as fi:
with open(file_name.replace('csv', 'out'), 'w') as fo:
for line in list(fi)[1:-1]: # for all lines except the first and last
fo.write(line)
Using the with open
allows to omit the close
calls (because they are done implicitly) even if an exception occurs.
In case that still gives no output you could a print
statement that shows which file is being processed:
print(file_name) # just inside the for-loop before any `open` calls.
Since you're using python-3.5 you could also use pathlib
:
import pathlib
path = pathlib.Path('path/to/directory/')
# make sure it's a valid directory
assert path.is_dir(), "{} is not a valid directory".format(p.absolute())
for file_name in path.glob('*.csv'):
with file_name.open('r') as fi:
with pathlib.Path(str(file_name).replace('.csv', '.out')).open('w') as fo:
for line in list(fi)[1:-1]: # for all lines except the first and last
fo.write(line)
As Jon Clements pointed out there is a better way than [1:-1]
to exclude the first and last line by using a generator function. That way you will definitely reduce the amount of memory used and it might also improve the overall performance. For example you could use:
import pathlib
def ignore_first_and_last(it):
it = iter(it)
firstline = next(it)
lastline = next(it)
for nxtline in it:
yield lastline
lastline = nxtline
path = pathlib.Path('path/to/directory/')
# make sure it's a valid directory
assert path.is_dir(), "{} is not a valid directory".format(p.absolute())
for file_name in path.glob('*.csv'):
with file_name.open('r') as fi:
with pathlib.Path(str(file_name).replace('.csv', '.out')).open('w') as fo:
for line in ignore_first_and_last(fi): # for all lines except the first and last
fo.write(line)