Search code examples
pythonalgorithmtime-seriesmissing-datadata-processing

Algorithm for finding missed data capture by device


We have 4 cameras "camA", "camB", "camC", "camD" installed at every junction along a road and in this sequence. The flow of traffic is such "camA" is at the beginning, and "camD" is at the end.

        |      |        |      |        |      |        |    |
--------        --------        --------        --------      -------
===>      camA            camB            camC            camD
---------------------------------------------------------------------

Whenever a vehicle passes a junction or makes a turn at the junction, the camera is positioned to record down the license plate number. A working camera captures the plate number MOST of the time, but when it misses a certain % of cars in the past N cars, we consider it a "bad" camera.

For example, if a car with plate "plateAlice" drives forward without turning, and a second car with plate "plateTom" makes a turn at "camC" junction, we will record down the captures:

{
    "camA":[
        (1655529900, "plateAlice"),
        (1655530010, "plateTom"),
    ],
    "camB":[
        (1655529920, "plateAlice"),
        (1655530020, "plateTom"),
    ],
    "camC":[
        (1655529970, "plateAlice"),
        (1655530050, "plateTom"),
    ],
    "camD":[
        (1655529980, "plateAlice"),
    ]
}

Is there a way to take the past X records from all 4 cameras and identify when a camera has gone bad? For example, if a camera misses 5 out of the past 10 cars, we consider it as a bad camera.

Assumptions

  1. Multiple cameras can be bad, but at least 1 camera will always be working.
  2. A vehicle can return back to "camA" again, so plate numbers can be recorded multiple times by a camera when the vehicle returns.
  3. A vehicle can only make a turn at a junction if its plate number has been recorded by the camera (strange assumption to simplify things)

Python solution preferred, I have a non-working attempt in Python below. Thanks!


My non-working attempt in Python:

from collections import namedtuple

Record = namedtuple("record", ["time", "plate"])


# 10 vehicles (counting repeated vehicles that circle back again)
data = {
    "camA":[
        Record(1655530000, "plateAlice"),
        Record(1655530110, "plateTom"),
        Record(1655530210, "plateMissing"),
        Record(1655530310, "plateCharlie"),
        Record(1655530410, "plateDan"),
        Record(1655530510, "plateEdward"),
        Record(1655530610, "plateMissing"), # looped back for a 2nd time
        Record(1655530700, "plateAlice"), # looped back for a 2nd time
        Record(1655530800, "plateAlice"), # looped back for a 3rd time
        # missing "plateTom" (looped back for a 2nd time) because bad camera
        # missing "plateFrank" because bad camera
    ],
    "camB":[
        Record(1655530020, "plateAlice"),
        Record(1655530120, "plateTom"),
        # missing "plateMissing" because bad camera
        # missing "plateCharlie" because bad camera
        # missing "plateDan" because bad camera
        # missing "plateEdward" because bad camera
        Record(1655530620, "plateMissing"),
        # missing "plateAlice" because bad camera
        Record(1655530820, "plateAlice"), 
        # missing "plateFrank" because bad camera
    ],
    "camC":[
        Record(1655530070, "plateAlice"),
        Record(1655530150, "plateTom"),   # Makes a turn at this junction
        Record(1655530230, "plateMissing"),
        Record(1655530330, "plateCharlie"),
        Record(1655530430, "plateDan"),
        Record(1655530530, "plateEdward"),
        Record(1655530630, "plateMissing"),
        Record(1655530730, "plateAlice"), 
        Record(1655530830, "plateAlice"), 
        Record(1655530930, "plateTom"), # looped back for a 2nd time
        Record(1655530930, "plateFrank"),   # Makes a turn at this junction
    ],
    "camD":[
        Record(1655530080, "plateAlice"),
        # missing "plateTom" because it turned at "camC"
        Record(1655530240, "plateMissing"),
        Record(1655530340, "plateCharlie"),
        Record(1655530440, "plateDan"),
        # missing "plateEdward" because bad camera
        Record(1655530640, "plateMissing"),
        Record(1655530740, "plateAlice"), 
        Record(1655530840, "plateAlice"), 
        Record(1655530940, "plateTom"), # looped back for a 2nd time
    ]
}

def is_first_cam(cam):
    return cam == "camA"

def check_for_bad_cameras(data):
    misses = {
        "camA": 0,
        "camB": 0,
        "camC": 0,
        "camD": 0,
    }
    
    for r in data["camA"][::-1]:
        for i, cam in enumerate(data):
            if not is_first_cam(cam):
                for s in data[cam][::-1]:
                    if s.time > r.time and s.plate == r.plate:
                        break
                else:
                    misses[cam] += 1
        
    print(misses)   
    # Expected {'camA': 2, 'camB': 6, 'camC': 0, 'camD': 1}
    # Obtained {'camA': 0, 'camB': 3, 'camC': 0, 'camD': 1}
    
    
check_for_bad_cameras(data)

Solution

  • Plates should be seen by cameras in order ABCD, ABC, AB or A. Anything else misses at least the previous camera.

    Note: Missing D is invisible as it is equivalent to a turn-off at C.

    from collections import namedtuple, defaultdict
    from heapq import merge
    
    
    Record = namedtuple("record", ["time", "plate"])
    
    
    # data from your example.
    # data = ... # Later CPythons key order preservation assumed.
    
    cam2idx = {cam: i for i, cam in enumerate(data)}  # order cams should appear
    idx2cam = {i: cam for i, cam in enumerate(data)}
    last_idx = defaultdict(lambda :-1)
    time_plate_cam = [[(rec.time, rec.plate, cam) for rec in records]
                      for cam, records in data.items()]
    for time, plate, cam in merge(*time_plate_cam):  # sort by time
        idx = cam2idx[cam]
        prev = last_idx[plate]
        if idx != prev + 1 and idx != 0:
            print(f"{time}: {plate} appears suddenly at {cam} "
                  f"missing {idx2cam[idx-1]}")
        last_idx[plate] = idx
    

    Result:

    1655530230: plateMissing appears suddenly at camC missing camB
    1655530330: plateCharlie appears suddenly at camC missing camB
    1655530430: plateDan appears suddenly at camC missing camB
    1655530530: plateEdward appears suddenly at camC missing camB
    1655530730: plateAlice appears suddenly at camC missing camB
    1655530930: plateTom appears suddenly at camC missing camB
    1655530930: plateFrank appears suddenly at camC missing camB