Search code examples
pythonloopsfindrows

Finding the number of rows for all files within a folder


Hello I am trying to find the number of rows for all files within a folder. I am trying to do this for a folder that contains only ".txt" files and for a folder that contains ."csv" files.

I know that the way to get the number of rows for a SINGLE ".txt" file is something like this:

file = open("sample.txt","r") 
Counter = 0
  
Content = file.read() 
CoList = Content.split("\n") 
  
for i in CoList: 
    if i: 
        Counter += 1
          
print("This is the number of lines in the file") 
print(Counter) 

Whereas for a SINGLE ".csv" file is something like this:

file = open("sample.csv")
reader = csv.reader(file)
lines= len(list(reader))
print(lines)

But how can I do this for ALL files within a folder? That is, how can I loop each of these procedures across all files within a folder and, ideally, export the output into an excel sheet with columns akin to these:

Filename  Number of Rows
1.txt     900
2.txt     653

and so on and so on.

Thank you so much for your help.


Solution

  • You can use glob to detect the files and then just iterate over them.

    Other methods : How do I list all files of a directory?

    import glob
    
    # 1. list all text files in the directory
    rel_filepaths = glob.glob("*.txt")
    
    # 2. (optional) create a function to read the number of rows in a file
    def count_rows(filepath):
      res = 0
      f = open(filepath, 'r')
      res = len(f.readlines())
      f.close()
    
      return res
    
    # 3. iterate over your files and use the count_row function
    counts = [count_rows(filepath) for filepath in rel_filepaths]
    
    print(counts)
    

    Then, if you want to export this result in a .csv or .xslx file, I recommend using pandas.

    import pandas as pd
    
    # 1. create a new table and add your two columns filled with the previous values
    df = pd.DataFrame()
    df["Filename"] = rel_filepaths
    df["Number of rows"] = counts
    
    # 2. export this dataframe to `.csv`
    df.to_csv("results.csv")
    

    You can also use pandas.ExcelWriter() if you want to use the .xlsx format. Link to documentation & examples : Pandas - ExcelWriter doc