I have an active voting system which essentially comes down to a database table, with columns "votefrom" and "voteto".
Basically, I'm trying to work out an alert system that can detect abuse. One such case if finding a way to detect reciprocating votes (people who have the tendency to mainly vote for each other), or perhaps clusters of such votes.
The ideal outcome would be a list of users, where next to each user, there is a matched user and a percentage, showing that for instance user A is doing 10% of his votes to user B. This obviously in descending order, so that users with most tendency to vote specifically for one other user are on top.
So, my question is, would this be possible in queries alone, or how far can SQL get me?
Any pointers to existing implementations? (preferrably PHP)
As an example only (using MS SQL Server) but works in any SQL variant that supports COUNT() OVER()
CREATE TABLE mytable(
votefrom INTEGER
,voteto INTEGER
);
INSERT INTO mytable(votefrom,voteto) VALUES (100,200);
INSERT INTO mytable(votefrom,voteto) VALUES (100,200);
INSERT INTO mytable(votefrom,voteto) VALUES (100,200);
INSERT INTO mytable(votefrom,voteto) VALUES (100,200);
INSERT INTO mytable(votefrom,voteto) VALUES (100,200);
INSERT INTO mytable(votefrom,voteto) VALUES (100,300);
INSERT INTO mytable(votefrom,voteto) VALUES (100,400);
INSERT INTO mytable(votefrom,voteto) VALUES (100,500);
INSERT INTO mytable(votefrom,voteto) VALUES (100,600);
INSERT INTO mytable(votefrom,voteto) VALUES (100,700);
query:
select distinct
votefrom
, voteto
, count(*) over(partition by votefrom, voteto) fromto
, count(*) over(partition by votefrom) fromtot
, count(*) over(partition by votefrom, voteto) * 100 / count(*) over(partition by votefrom) pct
from mytable
result:
+----+----------+--------+--------+---------+-----+
| | votefrom | voteto | fromto | fromtot | pct |
+----+----------+--------+--------+---------+-----+
| 1 | 100 | 200 | 5 | 10 | 50 |
| 2 | 100 | 300 | 1 | 10 | 10 |
| 3 | 100 | 400 | 1 | 10 | 10 |
| 4 | 100 | 500 | 1 | 10 | 10 |
| 5 | 100 | 600 | 1 | 10 | 10 |
| 6 | 100 | 700 | 1 | 10 | 10 |
+----+----------+--------+--------+---------+-----+
see: http://rextester.com/UESP57757
Syntax in MySQL would be quite different while that dbms does not support COUNT() OVER()