I have quite a bunch of data; More prcisely, a 8 GB rpt file;
Now before processing it I want to know how many rows there actually are - this helps me to later find out how long the processing will take etc; Now reading an rpt file of that size in python as a whole obviously does not work so I need to read line by line; To find out the number of lines I wrote that simple python script:
import pandas as pd
counter=0
for line in pd.read_fwf("test.rpt", chunksize=1):
counter=counter+1
print(counter)
This seems to work well - however I realized that it is quite slow and to really read all the lines is unnecessary;
Is there a way to get the number of rows without reading each line?
Many thanks
I'm not familiar with the .rpt
file format, but if it can be read in as a text file (which I'm assuming it can if you're using pd.read_fwf
) then you can probably just use Python's builtins for input/output.
with open('test.rpt', 'r') as testfile:
for i, line in enumerate(testfile):
pass
# Add one to get the line count
print(i+1)
This will allow you to (efficiently) iterate over each line of the file object. The builtin enumerate
function will count each line as you read it.