Search code examples
pythonoutputglob

Output with Python Glob // Cannot find where is error in Python code


I have the following code, which does NOT give an error but it also does not produce an output.

The script is made to do the following:

  • The script takes an input file of 4 tab-separated columns:

  • It then counts the unique values in Column 1 and the frequency of corresponding values in Column 4 (which contains 2 different tags: C and D).

  • The output is 3 tab-separated columns containing the unique values of column 1 and their corresponding frequency of values in Column 4: Column 2 has the frequency of the string in Column 1 that corresponds with Tag C and Column 3 has the frequency of the string in Column 1 that corresponds with Tag D.

Here is a sample of input:

algorithm-n   like-1-resonator-n   8.1848   C
algorithm-n   produce-hull-n   7.9104   C
algorithm-n   like-1-resonator-n   8.1848   D
algorithm-n   produce-hull-n   7.9104   D
anything-n   about-1-Zulus-n   7.3731   C
anything-n   above-shortage-n   6.0142   C
anything-n   above-1-gig-n   5.8967   C
anything-n   above-1-magnification-n   7.8973   C
anything-n   after-1-memory-n   2.5866   C

and here is a sample of the desired output:

algorithm-n   2   2
anything-n      5   0

The code I am using is the following (which one will see takes into consideration all suggestions from the comments):

from collections import defaultdict, Counter 


def sortAndCount(opened_file):
    lemma_sense_freqs = defaultdict(Counter)    
    for line in opened_file:
        lemma, _, _, senseCode = line.split()
        lemma_sense_freqs[lemma][senseCode] += 1
    return lemma_sense_freqs

def writeOutCsv(output_file, input_dict):
    with open(output_file, "wb") as outfile:
        for lemma in input_dict.keys():
            for senseCode in input_dict[lemma].keys():
                outstring = "\t".join([lemma, senseCode,\
                str(input_dict[lemma][senseCode])])
                outfile.write(outstring + "\n")

import os
import glob


folderPath = "Python_Counter" # declare here

for input_file in glob.glob(os.path.join(folderPath, 'out_')):
    with open(input_file, "rb") as opened_file:
        lemma_sense_freqs = sortAndCount(input_file)
    output_file = "count_*.csv"
    writeOutCsv(output_file, lemma_sense_freqs)

My intuition is the problem is coming from the "glob" function. But, as I said before: the code itself DOES NOT give me an error -- but it doesn't seem to produce an output either.

Can someone help?

I have referred to the documentation here and here, and I cannot seem to understand what I am doing wrong.

Can someone provide me insight on how to solve the problem by outputting the results from glob. As I have a large amount of files I need to process.


Solution

  • In regards to your original code, *lemma_sense_freqs* is not defined cause it should be returned by the function sortAndCount(). And you never call that function. For instance, you have a second function in your code, which is called writeOutCsv. You define it, and then you actually call it on the last line.

    While you never call the function sortAndCount() (which is the one that should return the value of *lemma_sense_freqs*). Hence, the error.

    I don't know what you want to achieve exactly with that code, but you definitely need to write at a certain point (try before the last line) something like this

    lemma_sense_freqs = sortAndCount(input_file)
    

    this is the way you call the function you need and lemma_sense_freqs will then have a value associated and you shouldn't get the error.

    I cannot be more specific cause it is not clear exactly what you want to achieve with that code. However, you just are experiencing a basic issue at the moment (you defined a function but never used it to retrieve the value lemma_sense_freqs). Try to add the piece of code I suggest and play with it.