if you record all IP traffic (using wireshark or similar program) while browsing the internet, you'll find many packets sent not as part of of your browsing activity.
my question is:
if you wish to classify the packets (sent from your PC) into two groups:
1) packets sent as part of your browsing activity
2) all other packets
how would you use machine learning to solve this issue?
you can assume the packet-payload can't be used for this purpose because it's either encapsulated or encrypted, so only packet-headers can be used, e.g. TCP window size, TCP flag bits, packet length and packet directions.
Sounds like a binary classification problem.
There are three basic approaches you might use:
In each of the above cases you will need to prepare set of features to represent your data. So either a constant set of some features, or you might try to simply use packet header as a raw text and traing some text-based model, like some convolutional neural network etc.