I am new to squid and I would like to know is there an option in squid to set number of hits per domain per day when users are using proxy. For example I would like to set up 100 hits as limit to site http://example.com and when the limit is crossed we need to block the domain.
Any suggestions are appreciable.
The short answer is no, there is no Squid option to limit the number of hits against a domain. Squid does have the ability to launch custom scripts and connect to them on stdin/stdout though, sending data to the script & using its responses to control whether it will allow or deny a request. You could use that functionality to solve this problem. Two approaches I can think of:
Option 1: write a helper to accept the domain being accessed, and maintain a tally of all the domains it has seen and the number of times each one has been accessed, then return a response to Squid indicating whether the domain is over/under threshold, which Squid uses to allow/deny the request. The advantage with this approach is that all the logic would be contained in one script. The disadvantage is that Squid will dynamically launch multiple helpers in response to load, which means you would need to come up with a way for each instance of the script to share state, and shutdown/startup of the Squid process.
Option 2: split the above logic into two scripts, one to watch the logfile and maintain a tally of how many times each domain has been accessed, and a second script which Squid can launch as an external helper, to consult the tally maintained by the first script and return a response to Squid indicating whether the domain is over or under threshold. The advantage with this approach is that it can scale out to support multiple helpers, and also survives a reload/restart of the Squid process.
First, your Squid configuration:
logformat domainonly %>rd
access_log daemon:/tmp/domains.log logformat=domainonly
external_acl_type domaincheck concurrency=0 %>rd /tmp/domaincheck.pl
acl overlimit-domains external domaincheck
http_access deny overlimit-domains
Second, a script to watch the log. I recommend you place this in your crontab and fire it off at an interval which balances the load on your system, vs how much over-limit access you are willing to tolerate before hits become visible in the tally files, & blocked by Squid. You should also make sure Squid rotates your logs once a day, and set up a separate script to run at midnight to clear out all the files in $basedir, to zero out the tally files in preparation for the next day.
#!/usr/bin/perl
# File:/tmp/domainwatch.pl
use strict;
use Data::Dumper;
my $basedir = '/tmp/domaincntrs';
my $ptrfile = $basedir."/logpos.txt";
my $logfile = '/tmp/domains.log';
my $logpos = 0;
# Get last log position from pointerfile, detect if it has wrapped
if (open(INFILE, "<$ptrfile")) {
$logpos = <INFILE>;
close(INFILE);
$logpos = 0 if ($logpos > (-s $logfile));
}
# Open logfile, seek and begin reading
my %domainhash;
if (open(LOGFILE, "<$logfile")) {
seek (LOGFILE, $logpos, 0);
while (my $domain = <LOGFILE>) {
chomp($domain);
$domainhash{$domain} = $domainhash{$domain} + 1;
}
$logpos = tell(LOGFILE);
close(LOGFILE);
} else {
print "could not open logfile $logfile: $!\n";
}
# Iterate over entries learned from log and increment counter files for each domain
foreach my $domain (keys(%domainhash)) {
my $cntr = 0;
# Get current counter
if (open(CNTRFILE, "<".$basedir."/".$domain)) {
$cntr = <CNTRFILE>;
close(CNTRFILE);
}
# Write new counter
if (open(CNTRFILE, ">".$basedir."/".$domain)) {
print CNTRFILE ($cntr + $domainhash{$domain});
close(CNTRFILE);
}
}
# Write current log position back to pointer file
if (open (PTRFILE, ">$ptrfile")) {
print PTRFILE $logpos;
close PTRFILE;
} else {
print "could not write to pointerfile $ptrfile: $!\n";
}
And finally, a helper script which Squid can use to make policy decisions:
#!/usr/bin/perl
# File:/tmp/domaincheck.pl
use strict;
# Enable autoflush
$|=1;
my $basedir = '/tmp/domaincntrs';
# Set up infinite loop
while (my $line = <STDIN>) {
my ($domain, $limit, $rest) = split(/\s+/, $line, 3);
chomp($line);
$limit = 100 if (!int($limit));
my $cntr = 0;
my $resp = '';
if (open(INFILE, $basedir."/".$domain)) {
$cntr = <INFILE>;
close(INFILE);
}
chomp($cntr);
$resp = ($cntr > $limit) ? 'OK' : 'ERR';
if (open(LOGFILE, ">>/tmp/domaincheck.log")) {
print LOGFILE "domain=$domain limit=$limit cntr=$cntr resp=$resp\n";
close(LOGFILE);
}
print "$resp\n";
}
Some tips for tuning: you're going to get a lot of hits in the domains.log for access you might not expect, eg. even for entries which are denied. You should consider setting up an ACL to define successful requests (eg, HTTP result codes 200-299) and then apply that ACL to the access_log statement, to control what gets written to that file.
The domaincheck.pl script has a default limit of 100, but will accept variable limits passed to it from the Squid configuration file. You can use this to specify multiple invocations of that ACL in your squid.conf, eg:
# Define our busy and quiet domains
acl busy-domains dstdomain .google.com .microsoft.com
acl quiet-domains dstdomain .centos.org .adobe.com
# Define some busy and quiet limits
acl busy-domains-limit external domaincheck 750
acl quiet-domains-limit external domaincheck 100
# Combine the domains and limits ACLs into policy rules to deny access when both conditions are true
http_access deny busy-domains busy-domains-limit
http_access deny quiet-domains quiet-domains-limit
Repeat this for as many domain/limit thresholds as you need.