I’m very new to machine learning. I have a dataset with data given me by a f1 race. User is playing this game and is giving me this dataset. With machine learning, I have to work with this data and when a user (I know they are 10) plays a game I have to recognize who’s playing.
The data consists of datagram packet occurred in 1/10 second freq, the packets contains the following Time, laptime, lapdistance, totaldistance, speed, car position, traction control, last lap time, fuel, gear,..
I’ve thought to use a kmeans used in a supervised way. Which algorithm could be better?
this is somewhat a broad question, so I'll try my best
kmeans is unsupervised algorithm meaning it will find the classes itself and it best used when you know there are multiple classes but you don't know what exactly they are... using it with labeled data just means you will compute the distance of new vector v to each vector in the dataset and pick the one (or ones using majority vote) which give the min distance , this is not considered as machine learning
in this case when you do have the labels, supervised approach will yield much better results
I suggest try random forest and logistic regression at first, those are the most basic and common algorithms and they give pretty good results
if you haven't achieve the desired accuracy you can use deep learning and build a neural network with input layer as big as your packet's values and output layer of the number of classes, in between you can use one or multiple hidden layers with various nodes, but this is advanced approach and you better pick up some experience in machine learning field before pursue it
Note: the data is a time series, meaning that every driver has it's own behaviour of driving a car, so data should be considered as bulks of points, with this you can apply pattern matching technics, also there are a several neural networks build exactly for this data (like RNN) but this is far far advanced and much more difficult to implement