I am working on a flood control for a chat system, one of the ideas was to check how equal the past message is based on the newest message a member has sent within X minutes.
So if the member's lastest message was sent with 5 minutes of his past message it will check how equal the past message is against the latest message he sent, if that hits 80% or more he would be not be able to talk for a while.
Problem is I don't know how this sort of algorithm would look like and I am not sure if it would be an efficient approach either...
Let's go to the facts, user sends:
[00:00:01] MemberX: Hi everyone !
[00:00:02] MemberX: Hi everyone ! MUAH
[00:00:03] MemberX: Hi everyone ! 1
So in the above context the user would have his talking access removed for X minutes.
I guess I could checksum the message which would work for sequential messages like those where text is add at the end.
How would I calculate the percentage of match?
Out of the byte length of the past message against the byte length of the latest message that matched?
Example:
(9/10)*100 = 90%
Now let's go a little harder:
[00:00:01] MemberX: Hi hey everyone !
[00:00:02] MemberX: Hi everyone ! MUAH
[00:00:03] MemberX: Hi 123 everyone !
In this second case checksum would fail and would not be usable at all, I believe.
Is there a good algorithm to catch flood like that? I don't want to catch 100% of it but at least a small percentage to make the room cleanner.
The first part of it would work for a lot of abusers but some of the smarter folks would think of the 2nd way there is probably a lot other ways too, this is just an initial idea of things I could implement.
I don't want to restrain all users from talking with a flood time limit as most of them do type fast. I just want to catch people sending repeatable text over and over within a small range of time.
So my question is what would be a good algorithm to overcome this sort of flood?
Many IRC servers use a "Leaky Bucket" approach to throttle users to a constant rate. They keep track of the delta-time between the user's last messages sent and use that to calculate a "rate". This is often implemented as a per-user queue of messages to be sent. If the user goes above the rate they are throttled, unless they exceed the rate by a given amount at which point they are banned.
Another common approach on IRC is to simply keep track of the last N messages, and if some threshold of repeatability (i.e. the same message over and over again) is exceeded to kick/ban the user.