Flood control, check how equal the past message is compared with the latest message in %

I am working on a flood control for a chat system, one of the ideas was to check how equal the past message is based on the newest message a member has sent within X minutes.

So if the member's lastest message was sent with 5 minutes of his past message it will check how equal the past message is against the latest message he sent, if that hits 80% or more he would be not be able to talk for a while.

Problem is I don't know how this sort of algorithm would look like and I am not sure if it would be an efficient approach either...

Let's go to the facts, user sends:

[00:00:01] MemberX: Hi everyone !
[00:00:02] MemberX: Hi everyone ! MUAH
[00:00:03] MemberX: Hi everyone ! 1

So in the above context the user would have his talking access removed for X minutes.

I guess I could checksum the message which would work for sequential messages like those where text is add at the end.

How would I calculate the percentage of match?

Out of the byte length of the past message against the byte length of the latest message that matched?

Example:

past message 10 bytes
latest message 14 bytes
checksum matched up to 9 bytes: (9/10)*100 = 90%

Now let's go a little harder:

[00:00:01] MemberX: Hi hey everyone !
[00:00:02] MemberX: Hi everyone ! MUAH
[00:00:03] MemberX: Hi 123 everyone !

In this second case checksum would fail and would not be usable at all, I believe.

Is there a good algorithm to catch flood like that? I don't want to catch 100% of it but at least a small percentage to make the room cleanner.

The first part of it would work for a lot of abusers but some of the smarter folks would think of the 2nd way there is probably a lot other ways too, this is just an initial idea of things I could implement.

I don't want to restrain all users from talking with a flood time limit as most of them do type fast. I just want to catch people sending repeatable text over and over within a small range of time.

So my question is what would be a good algorithm to overcome this sort of flood?

Solution

Many IRC servers use a "Leaky Bucket" approach to throttle users to a constant rate. They keep track of the delta-time between the user's last messages sent and use that to calculate a "rate". This is often implemented as a per-user queue of messages to be sent. If the user goes above the rate they are throttled, unless they exceed the rate by a given amount at which point they are banned.

Another common approach on IRC is to simply keep track of the last N messages, and if some threshold of repeatability (i.e. the same message over and over again) is exceeded to kick/ban the user.