Search code examples
pythonlinecache

Python: linecache not working as expected?


Hello I have this python script that I need to use to traverse some directories and extract some info within some files within those directories.

So I have many directories. Within each one of those directories there are 5 more subdirectories. And within each one of those 5 subdirectories I have 3 text files. One is a .txt which I ignore, the other is a .out which I need to read to see if it has a single line with the word "Fin". If it has that line then I have to read the remaining file which has a .time extension. This file has the output of the time command in Unix which looks like this:

real    0m1.185s
user    0m0.027s
sys     0m0.026s

From this file I need to extract the real time line which is the second line on that file (real 0m1.185s) the first line is a '\n'.

So I extract this line for each of the files on the 5 subdirectories (that means 5 files total) on the current directory and I have to sum up the total number of seconds that each line of each file indicates and then divide it by 5 to get an average of the values on the 5 subdirectories.

Now for each of these averaged totals I write an output file with the value. So if I have two directories

1/
2/

Each on of these directories has 5 subdirectories

1/1 1/2 1/3 1/4 1/5
2/2 2/2 2/3 2/4 2/5

Within those subdirectories I have the text files, that means that on 1/1 there's a something.out file hopefully with the word "Fin" inside. If it is then there's a something.time file on 1/1 where I extract the real time line from. Then I sum up the values of the .time files in 1/1 1/2 1/3 1/4 1/5 and divide them by 5 to get the average. Then I write this average to an output file.

The problem I'm having is that I use the command linecache.getline to extract the second line from the something.time file but that's not working properly since it weirdly extracts the same line on each subdirectory. So on the subdirectory 1/1 the second line of the something.time file is "real 0m1.809s". My code does this fine but then it goes into the 1/2 subdirectory and extracts the second line of the something.time file there and strangely it shows that is the same "real 0m1.809s" line but if I cat into the something.time file in 1/2 it shows it's "real 0m1.009s".

Now the same happens within the 2/ directory. It extracts the first line of the file on the first subdiectory it goes in but then it just repeats that line 5 times.

Here is my code can someone point me to where my error is?

def proArch(dirArch):
    dirList = os.listdir(dirArch)
    resultado = 0
    valores=[]
    for f in dirList:
       if("out" in f):
          for linea in open(f):
            if "Fin" in linea:
              for f_v in dirList:
                if("time" in f_v):
                  linea=linecache.getline(f_v, 2)
                  valores=re.split("['\tms']",linea)[1:3]
                  resultado=(float(valores[0])*60)+float(valores[1])
                else:
                  print("El archivo "+dirArch+" no se proceso bien.")

    return resultado


dirList_g = os.listdir(".")
dirOrig = os.getcwd()
res_tot=0.0
for d in dirList_g:
    if os.path.isdir(d) == True:
     os.chdir(dirOrig+"/"+d)
     dirAct = os.getcwd()
     dirList_w = os.listdir(".")
     for d_w in dirList_w:
       os.chdir(dirAct+"/"+d_w)
       dirArch = os.getcwd()
       res_tot=res_tot+proArch(dirArch)

     res_tot=res_tot/5
     os.chdir(dirOrig)
     with open("output.txt", "w") as text_file:
        text_file.write(dirAct+" "+str(res_tot)+"\n")
     res_tot=0.0

Solution

  • It's possible linecache is messing with you and actually caching the line from a similarly named file from last time.

    Also, it looks like you're not using the full filepath so you may be opening a different file than what you expect.

    For example, instead of using f_v you'll want to do something like:

    filepath = os.path.join(<dirname>, <filename>)
    

    Try replacing linecache.getline with something like:

    def get_line(filename, n):
        with open(filename, 'r') as f:
            for line_number, line in enumerate(f):
                if line_number == n:
                    return line
    

    Unlike linecache this will actually open the file and read it each time.

    Finally, this code would likely be much clearer and easier to deal with if you rewrote it using os.walk:

    https://docs.python.org/2/library/os.html

    For example:

    import os
    for root, dirs, files in os.walk('someplace'):
        for dir in dirs:
            # do something with the dirs
        for file in files:
            # do whatever with the files