Search code examples
algorithmopencvcomputer-visionproofcorrectness

How do people prove the correctness of Computer Vision methods?


I'd like to pose a few abstract questions about computer vision research. I haven't quite been able to answer these questions by searching the web and reading papers.

  1. How does someone know whether a computer vision algorithm is correct?
  2. How do we define "correct" in the context of computer vision?
  3. Do formal proofs play a role in understanding the correctness of computer vision algorithms?

A bit of background: I'm about to start my PhD in Computer Science. I enjoy designing fast parallel algorithms and proving the correctness of these algorithms. I've also used OpenCV from some class projects, though I don't have much formal training in computer vision.

I've been approached by a potential thesis advisor who works on designing faster and more scalable algorithms for computer vision (e.g. fast image segmentation). I'm trying to understand the common practices in solving computer vision problems.


Solution

  • In practice, computer vision is more like an empirical science: You gather data, think of simple hypotheses that could explain some aspect of your data, then test those hypotheses. You usually don't have a clear definition of "correct" for high-level CV tasks like face recognition, so you can't prove correctness.

    Low-level algorithms are a different matter, though: You usually have a clear, mathematical definition of "correct" here. For example if you'd invent an algorithm that can calculate a median filter or a morphological operation more efficiently than known algorithms or that can be parallelized better, you would of course have to prove it's correctness, just like any other algorithm.

    It's also common to have certain requirements to a computer vision algorithm that can be formalized: For example, you might want your algorithm to be invariant to rotation and translation - these are properties that can be proven formally. It's also sometimes possible to create mathematical models of signal and noise, and design a filter that has the best possible signal to noise-ratio (IIRC the Wiener filter or the Canny edge detector were designed that way).

    Many image processing/computer vision algorithms have some kind of "repeat until convergence" loop (e.g. snakes or Navier-Stokes inpainting and other PDE-based methods). You would at least try to prove that the algorithm converges for any input.