Search code examples
pythonunixfilepathglobsungridengine

Python - glob.glob doesn't find *.txt in specified filepath within Unix OS


I am converting some Python scripts I wrote in a Windows environment to run in Unix (Red Hat 5.4), and I'm having trouble converting the lines that deal with filepaths. In Windows, I usually read in all .txt files within a directory using something like:

pathtotxt = "C:\\Text Data\\EJC\\Philosophical Transactions 1665-1678\\*\\*.txt"
for file in glob.glob(pathtotxt):

It seems one can use the glob.glob() method in Unix as well, so I'm trying to implement this method to find all text files within a directory entitled "source" using the following code:

#!/usr/bin/env python
import commands
import sys
import glob
import os

testout = open('testoutput.txt', 'w')
numbers = [1,2,3]
for number in numbers:
    testout.write(str(number + 1) + "\r\n")
testout.close

sourceout = open('sourceoutput.txt', 'w')
pathtosource = "/afs/crc.nd.edu/user/d/dduhaime/data/hill/source/*.txt"
for file in glob.glob(pathtosource):
    with open(file, 'r') as openfile:
        readfile = openfile.read()
        souceout.write (str(readfile))
sourceout.close

When I run this code, the testout.txt file comes out as expected, but the sourceout.txt file is empty. I thought the problem might be solved if I change the line

pathtosource = "/afs/crc.nd.edu/user/d/dduhaime/data/hill/source/*.txt"

to

pathtosource = "/source/*.txt"

and then run the code from the /hill directory, but that didn't resolve my problem. Do others know how I might be able to read in the text files in the source directory? I would be grateful for any insights others can offer.

EDIT: In case it is relevant, the /afs/ tree of directories referenced above is located on a remote server that I'm ssh-ing into via Putty. I'm also using a test.job file to qsub the Python script above. (This is all to prepare myself to submit jobs on the SGE cluster system.) The test.job script looks like:

#!/bin/csh
#$ -M [email protected]
#$ -m abe
#$ -r y
#$ -o tmp.out
#$ -e tmp.err
module load python/2.7.3
echo "Start - `date`"
python tmp.py 
echo "Finish - `date`"

Solution

  • Got it! I had misspelled the output command. I wrote

    souceout.write (str(readfile))
    

    instead of

    sourceout.write (str(readfile))
    

    What a dunce. I also added a newline bit to the line:

    sourceout.write (str(readfile) + "\r\n")
    

    and it works fine. I think it's time for a new IDE!