Search code examples
pythonpython-3.xfilecsvfile-processing

How to merge continuous lines of a csv file


I have a csv file that carries outputs of some processes over video frames. In the file, each line is either fire or none. Each line has startTime and endTime. Now I need to cluster and print only one instance out of continuous fires with their start and end time. The point is that a few none in the middle can also be tolerated if their time is within 1 second. So to be clear, the whole point is to cluster detections of closer frames together...somehow smooth out the results. Instead of multiple 31-32, 32-33, ..., have a single line with 31-35 seconds.

How to do that?

For instance, the whole following continuous items are considered a single one since the none gaps is within 1s. So we would have something like 1,file1,name1,30.6,32.2,fire,0.83 with that score being the mean of all fire lines.

frame_num,uniqueId,title,startTime,endTime,startTime_fmt,object,score
...
10,file1,name1,30.6,30.64,0:00:30,fire,0.914617
11,file1,name1,30.72,30.76,0:00:30,none,0.68788
12,file1,name1,30.84,30.88,0:00:30,fire,0.993345
13,file1,name1,30.96,31,0:00:30,fire,0.991015
14,file1,name1,31.08,31.12,0:00:31,fire,0.983197
15,file1,name1,31.2,31.24,0:00:31,fire,0.979572
16,file1,name1,31.32,31.36,0:00:31,fire,0.985898
17,file1,name1,31.44,31.48,0:00:31,none,0.961606
18,file1,name1,31.56,31.6,0:00:31,none,0.685139
19,file1,name1,31.68,31.72,0:00:31,none,0.458374
20,file1,name1,31.8,31.84,0:00:31,none,0.413711
21,file1,name1,31.92,31.96,0:00:31,none,0.496828
22,file1,name1,32.04,32.08,0:00:32,fire,0.412836
23,file1,name1,32.16,32.2,0:00:32,fire,0.383344

This is my attempts so far:

with open(filename) as fin:
    lastWasFire=False
    for line in fin:
        if "fire" in line:
             if lastWasFire==False and line !="" and line.split(",")[5] != lastline.split(",")[5]:
                  fout.write(line)
             else:
                lastWasFire=False
             lastline=line

Solution

  • I assume you don't want to use external libraries for data processing like numpy or pandas. The following code should be quite similar to your attempt:

    threshold = 1.0
    
    # We will chain a "none" object at the end which triggers the threshold to make sure no "fire" objects are left unprinted
    from itertools import chain
    trigger = (",,,0,{},,none,".format(threshold + 1),)
    
    # Keys for columns of input data
    keys = (
        "frame_num",
        "uniqueId",
        "title",
        "startTime",
        "endTime",
        "startTime_fmt",
        "object",
        "score",
    )
    
    # Store last "fire" or "none" objects
    last = {
        "fire": [],
        "none": [],
    }
    
    with open(filename) as f:
        # Skip first line of input file
        next(f)
        for line in chain(f, trigger):
            line = dict(zip(keys, line.split(",")))
            last[line["object"]].append(line)
            # Check threshold for "none" objects if there are previous unprinted "fire" objects
            if line["object"] == "none" and last["fire"]:
                if float(last["none"][-1]["endTime"]) - float(last["none"][0]["startTime"]) > threshold:
                    print("{},{},{},{},{},{},{},{}".format(
                        last["fire"][0]["frame_num"],
                        last["fire"][0]["uniqueId"],
                        last["fire"][0]["title"],
                        last["fire"][0]["startTime"],
                        last["fire"][-1]["endTime"],
                        last["fire"][0]["startTime_fmt"],
                        last["fire"][0]["object"],
                        sum([float(x["score"]) for x in last["fire"]]) / len(last["fire"]),
                    ))
                    last["fire"] = []
            # Previous "none" objects don't matter anymore as soon as a "fire" object is being encountered
            if line["object"] == "fire":
                last["none"] = []
    

    The input file is being processed line by line and "fire" objects are being accumulated in last["fire"]. They will be merged and printed if either

    • the "none" objects in last["none"] reach the threshold defined in threshold

    • or when the end of the input file is reached due to the manually chained trigger object, which is a "none" object of length threshold + 1, therefore triggering the threshold and subsequent merge and print.

    You could replace print with a call to write into an output file, of course.