Search code examples
pythonfiledirectoryfilenames

Processing filenames in Python


I've written a function to strip double spaces out of my raw data files:

def fixDat(file):
    '''
    Removes extra spaces in the data files. Replaces original file with new
    and renames original to "...._original.dat".
    '''
    import os

    import re
    with open(file+'.dat', 'r') as infile:
        with open(file+'_fixed.dat', 'w') as outfile:
            lines = infile.readlines()
            for line in lines:
                fixed = re.sub("\s\s+" , " ", line)
                outfile.write(fixed)

    os.rename(file+'.dat', file+'_original.dat')
    os.rename(file+'_fixed.dat', file+'.dat')

I have 19 files in a folder that I need to process with this function, but I'm not sure how to parse the filenames and pass them to the function. Something like

for filename in folder:
    fixDat(filename)

but how do I code filename and folder in Python?


Solution

  • If I understand correctly, you are asking about the os module's .walk() functionality. Where an example would look like:

    import os
    for root, dirs, files in os.walk(".", topdown=False): # "." uses current folder
        # change it to a pathway if you want to process files not where your script is located
        for name in files:
            print(os.path.join(root, name))
    

    With filename outputs which can be fed to your fixDat() function such as:

    ./tmp/test.py
    ./amrood.tar.gz
    ./httpd.conf
    ./www.tar.gz
    ./mysql.tar.gz
    ./test.py
    

    Note that these are all strings so you could change the script to:

    import os
    for root, dirs, files in os.walk(".", topdown=False):
        for name in files:
            if name.endswith('.dat'): # or some other extension
                print(os.path.join(root, name))
                fixDat(os.path.join(root, name))