I have been trying to figure out how I can accomplish this with the for
loop and the enumerate
objects that I am given in python. I have a time in the format of HH:MM
. And I have a csv file in which the first column is a timestamp that follows in that same format. I then search the file for the matching time, and then I extract that row to later be converted into an XML File. However, I need to extract the row before and the row after that target row as well. I have tried the following piece of code:
def findRow(timeID, filename):
rows = []
csvFile = csv.reader(open(filename, "rb"), delimiter=",")
for i, row in enumerate(csvFile):
if timeID == timeInRow:
rows.append(i-1)
rows.append(i)
rows.append(i+1)
return rows
However, I realized shortly after that this is not the correct way to do this because I am extracting the index and not the value. What I need is something something like row[i-1],row[i],row[i+1]. In other words, I need i's element that matches the row.
Is there an easy way to do this? I have thought about using range(csvFile)
but I honestly have no idea what that would end up doing.
I would use a different approach:
next
to get the next row, and return the 3 rowslike this (I added a comment since timeInRow
should be extracted from row
but your code doesn't show it):
prev_row = [] # just in case it matches at first row
for row in csvFile:
# something must be done to extract timeInRow from row here!
if timeID == timeInRow:
return [prev_row,row,next(csvFile,[])]
prev_row = row # save current row for next iteration
next
uses a default empty list value just in case the last line matches (avoids StopIteration
exception)
This linear approach works, but if the rows are sorted by time and you need to perform several searches, a better approach (faster) would probably to create a list of rows, a list of times, then use bisect
module to compute the insertion point in the list of times, check that the times match, and use the index to return a slice of the list of rows.
Something like:
list_of_rows = list(csvFile)
list_of_times = [x[3] for x in list_of_rows] # assume that the time is the 4th column here
i = bisect.bisect(list_of_rows,timeInRow)
if i < len(list_of_rows) and list_of_rows[i] == timeInRow:
return list_of_rows[max(i-1,0):min(i+2,len(list_of_rows)]
If you only need to perform 1 search, this is slower because you have to create the list anyway so O(n) + O(log(n))
. But if you want to perform several time searches in the same list, the cost is O(log(n))
per search.