I'm working with a database that holds lots of urls (tens of thousands). I'm attempting to multi-thread a resolver, that simply tries to resolve a given domain. On success, it compares the result to what's currently in the database. If it's different, the result is updated. If it fails, it's also updated.
Naturally, this will produce an inordinate volume of database calls. To clarify some of my confusion about the best way to achieve some form of asynchronous load distribution, I have the following questions (being fairly new to Perl still).
I've been playing with a more pythonic method (given that I have more experience in Python), but have yet to make it work due to a lack of blocking for some reason. Asside from that issue, threading isn't the best option simply due to (a lack of) CPU time for each thread (plus, I've been crucified more than once in the Perl channel for using threads :P and for good reason)
Below is more or less psuedo-code that I've been playing with for my threads (which should be used more as a supplement to my explanation of what I'm trying to accomplish, than anything).
# Create children...
for (my $i = 0; $i < $threads_to_spawn; $i++ )
{
threads->create(\&worker);
}
The parent then sits in a loop, monitoring a shared array of domains. It locks and re-populates it if it becomes empty.
Your code is the start of a persistent worker model.
use threads;
use Thread::Queue 1.03 qw( );
use constant NUM_WORKERS => 5;
sub work {
my ($dbh, $job) = @_;
...
}
{
my $q = Thread::Queue->new();
for (1..NUM_WORKERS) {
async {
my $dbh = ...;
while (my $job = $q->dequeue())
work($dbh, $job);
}
};
}
for my $job (...) {
$q->enqueue($job);
}
$q->end();
$_->join() for threads->list();
}
Performance tips: