Search code examples
cmachine-learningartificial-intelligencebayesiandata-analysis

Detect a certain characteristic in a data set


So basically I have a dataset with 2 columns:

| Time (millis)  | Speed (m/s) |
--------------------------------
|  0             |    0.5      |
|  20            |    1.5      |
|  40            |    4.5      |
|  60            |    8.5      |
|  80            |    8.9      |
|  100           |    7.5      |
|  120           |    4.3      |
|  140           |    1.5      |
|  160           |    0.5      |
|  180           |    0.5      |
|  200           |    0.5      |
|  220           |    0.5      |

This is a short sample of a person running with its speed in chunks of 20 milliseconds.

So I'm trying to detect sprints (when the person is running at full speed over a short distance).

Due to the nature of my requirements I'm writing a program to calculate this in c. I can easily do it in a dirty manner, defining some min, max, looking for peaks and there's the sprint. But I'm thinking there must a better way to do it, maybe some machine learning algorithm I'm not aware of.

Would be great if I could teach the program what a sprint is by showing it some examples and then detect them with no more intervention from my side. I'm just not sure how to get started on that.

Has anyone come across something similar and can point me in the right direction?


Solution

  • This feels like using a bazooka to kill a fly; I think your "dirty" method is the only way to go. The term "sprint" has no real meaning... for you to feed any machine with examples of a sprint means that you've already used your own, arbitrary classification method to determine whether you think it is, or is not, a sprint.

    How would you define the problem? Some people may run 10 m/s flat-out, others may run 3 m/s and consider that a sprint. How could you know from your limited dataset whether the person was pushing themselves to a sprint-worthy limit at the time? Perhaps they could go faster. How does the result from one person influence the dataset on another person to determine their "sprint" threshold? Lots of questions, but I think valid ones. Really you're only able to make inference based on your data; what is the maximum speed in relation to the mean for example. I wouldn't over-complicate it.

    If, however, you were collecting the results in a controlled format, asking many people to sprint, then recording factors that affect their ability: BMI, weight, age, medical conditions, head wind speed and so forth. Then you might have something that would benefit from machine learning.