Good morning, Stack. I'm trying to figure out a way to predict trends in data.. I'm wondering if there is a better way to do this.. Are there any built-in functions or libraries I can look into?
Here's what I've got: (http://3v4l.org/RGU3i)
$PopulationOfTexas = array(
1999 => 20.56, // in millions
2000 => 21.56,
2001 => 22.56,
2002 => 23.56
);
//generate an array sohwing the difference in each year compared to the previous year
$differneces = array();
$lastyear = null;
foreach($PopulationOfTexas as $k=>$v){
if(empty($lastyear)){$lastyear = $k; continue;}
$differneces[$k] = $k - $lastyear;
$lastyear = $k;
//use this later
$lastitem = array("year"=>$k, "data"=>$v);
}
//get the average difference per year
$count = 0;
$total = 0;
foreach($differneces as $k=>$v){
$count++;
$total += $v;
}
$average = number_format(($total/$count), 2);
//make a prediction
$predictions = array();
for($i=0;$i<5;$i++){
$year = isset($year) ? $year+1 : $lastitem["year"]+1;
$prediction = isset($prediction) ? $prediction+floatval($average) : $lastitem["data"]+floatval($average);
$predictions[$year] = $prediction;
}
print_r($predictions);
The algorithm is completely broken because it is calculating the average increase of the array keys (the year values, 1999, 2000 etc) rather than the array values (population), so the result is always 1.
This is masked by the fact your sample population data always increases by one, if you had added more variation you would probably have spotted the error. To fix:
foreach($PopulationOfTexas as $k=>$v){
if(empty($lastyear)){$lastyear = $v; continue;}
$differneces[$k] = $v - $lastyear;
$lastyear = $v;
//use this later
$lastitem = array("year"=>$k, "data"=>$v);
}
In more general terms the algorithm is extremely simplistic in that it will predict a flat increase / decrease.