I've written my first functional PHP webapp called Heater. It presents interactive calendar heatmaps using the Google Charts libraries and a AWS Redshift backend.
Now that I have it working, I've started improving the performance. I've installed APC and verified it is working.
My question is how do I enable query caching in front of Redshift?
Here's an example of how I'm loading data for now:
getRsData.php:
<?php
$id=$_GET["id"];
$action=$_GET["action"];
$connect = $rec = "";
$connect = pg_connect('host=myredshift.redshift.amazonaws.com port=5439 dbname=mydbname user=dmourati password=mypasword');
if ($action == "upload")
$rec = pg_query($connect,"SELECT date,SUM(upload_count) as upload_count from dwh.mytable where enterprise_id='$id' GROUP BY date");
...
?>
Some of the queries take > 5 seconds which negatively impacts the user experience. The data is slow moving as in it updates only once per day. I'd like to front the Redshift query with a local APC cache and then invalidate it via cron (or some such) once a day to allow for the newer data to flow in. I'd eventually like to create a cache warming script but that is not necessary at this time.
Any pointers or tips to documentation are helpful. I've spent some time googling but most docs out there are just about document caching not query caching if that makes sense. This is a standalone host running AWS Linux and PHP 5.3 with apc-3.1.15.
Thanks.
EDIT to add input validation
if (!preg_match("/^[0-9]*$/",$id)) {
$idErr = "Only numbers allowed";
}
if (empty($_GET["action"])) {
$actionErr = "Action is required";
} else {
$action = test_input($action);
}
function test_input($data) {
$data = trim($data);
$data = stripslashes($data);
$data = htmlspecialchars($data);
return $data;
}
It doesn't seem APC is needed for this since you're caching data for a day which is relatively long.
The code below caches your query results in a file ($cache_path
). Before querying redshift it checks whether a cache file for the given enterprise id exists and was created the same day. If it does and if the code can successfully retrieve the cache then the rows are returned from the cache but if the file doesn't exist or the rows can't be retrieved from the cache, the code will query the db and write the cache.
The results of the query/cache are returned in $rows
<?php
$id=$_GET["id"];
$action=$_GET["action"];
$connect = $rec = "";
$connect = pg_connect('host=myredshift.redshift.amazonaws.com port=5439 dbname=mydbname user=dmourati password=mypasword');
if ($action == "upload") {
$cache_path = "/my_cache_path/upload_count/$id";
if(!file_exists($cache_path)
|| date('Y-m-d',filemtime($cache_path)) < date('Y-m-d')
|| false === $rows = unserialize(file_get_contents($cache_path))) {
$rows = array();
$rec = pg_query($connect,"SELECT date,SUM(upload_count) as upload_count from dwh.mytable where enterprise_id='$id' GROUP BY date");
while($r = pg_fetch_assoc($rec)) {
$rows[] = $r;
}
file_put_contents($cache_path,serialize($rows));
}
}
?>