Search code examples
pythonfile-iopython-3.5glob

Python glob gives no result


I have a directory that contains a lot of .csv files, and I am trying to write a script that runs on all the files in the directory while doing the following operation:

Remove the first and last lines from all the csv files

I am running the following code:

import glob

list_of_files = glob.glob('path/to/directory/*.csv')
for file_name in list_of_files:
    fi = open(file_name, 'r')
    fo = open(file_name.replace('csv', 'out'), 'w')  #make new output file for each file
    num_of_lines = file_name.read().count('\n')
    file_name.seek(0)
    i = 0
    for line in fi:
        if i != 1 and i != num_of_lines-1:
            fo.write(line)

    fi.close()
    fo.close()

And I run the script using python3 script.py. Though I don't get any error, I don't get any output file either.


Solution

  • There are multiple issues in your code. First of all you count the number of lines on the filename instead of the file-object. The second problem is that you initialize i=0 and compare against it but it never changes.

    Personally I would just convert the file to a list of "lines", cut off the first and last and write all of them to the new file:

    import glob
    
    list_of_files = glob.glob('path/to/directory/*.csv')
    for file_name in list_of_files:
        with open(file_name, 'r') as fi:
            with open(file_name.replace('csv', 'out'), 'w') as fo:
                for line in list(fi)[1:-1]:  # for all lines except the first and last
                    fo.write(line)
    

    Using the with open allows to omit the close calls (because they are done implicitly) even if an exception occurs.


    In case that still gives no output you could a print statement that shows which file is being processed:

    print(file_name)  # just inside the for-loop before any `open` calls.
    

    Since you're using python-3.5 you could also use pathlib:

    import pathlib
    
    path = pathlib.Path('path/to/directory/')
    
    # make sure it's a valid directory
    assert path.is_dir(), "{} is not a valid directory".format(p.absolute())
    
    for file_name in path.glob('*.csv'):
        with file_name.open('r') as fi:
            with pathlib.Path(str(file_name).replace('.csv', '.out')).open('w') as fo:
                for line in list(fi)[1:-1]:  # for all lines except the first and last
                    fo.write(line)
    

    As Jon Clements pointed out there is a better way than [1:-1] to exclude the first and last line by using a generator function. That way you will definitely reduce the amount of memory used and it might also improve the overall performance. For example you could use:

    import pathlib
    
    def ignore_first_and_last(it):
        it = iter(it)
        firstline = next(it)
        lastline = next(it)
        for nxtline in it:
            yield lastline
            lastline = nxtline
    
    path = pathlib.Path('path/to/directory/')
    
    # make sure it's a valid directory
    assert path.is_dir(), "{} is not a valid directory".format(p.absolute())
    
    for file_name in path.glob('*.csv'):
        with file_name.open('r') as fi:
            with pathlib.Path(str(file_name).replace('.csv', '.out')).open('w') as fo:
                for line in ignore_first_and_last(fi):  # for all lines except the first and last
                    fo.write(line)