Search code examples
computer-visionanomaly-detection

How to convert video volumes (after dense sampling) with different scales to descriptor?




I read this article (link) and try to understand algorithm, that presented there.
So, now I understand almost all points from this article, but have question:

How to convert video volumes (after dense sampling) with different scales to descriptor?

As I understand, if I have video with 100 frames with 120*160, then I apply dense scale with different scales(for example [5*5*5, 10*10*10, 20*20*20] ), then I will get respectively [15360, 1920, 240] cubes. But, after that I need to make descriptors for each of them, and length of descriptors must be the same(in this article length of descriptor is the same as size of cube, so [125, 1000, 8000]).

One of the solutions, that I think is create for each pixel cubes in different scales and after that concatenate them in one vector with length 9125. Is it right?


Solution

  • So, I've found the answer.
    Around each pixel I must to build cubes each size(so, it will be around 1920000 cubes for each size)