Search code examples
phpmysqlcakephpcakephp-2.0internal-server-error

PHP Script Internal Server Error when lots of data


Summary

This is a script (CakePHP 2.10.18 - LAMP dedicated server with PHP 5.3) that loads information from 2 MySQL tables, and then does some process of the data to output it to excel.

Table 1 has users, and Table 2 has info about those users (one record per user). The script has the goal of grabbing the record of a user from Table 1, grabbing its related info from Table 2, and put it in an excel row (using PHPExcel_IOFactory library for this).

The information extracted of those tables is of around 8000 records from each, the tables themselves have 100K and 300K total records respectively. All the fields in those tables are ints and small varchars with the exception of one field in the second table (datos_progreso seen in the code below), which is a text field and contains serialized data, but nothing big.

The issue is that if I run the script for the full 16000 records I get an Internal Server Error (without really any explanation in the logs), if I run the script for 1000 records it all works fine, so this seems to point out it's a resources issue.

I've tried (among other things that I will explain at the end) increasing the memory_limit from 128M to 8GB (yes you read that right), max_execution_time from 90 to 300 seconds, and max_input_vars from 1000 to 10000, and that isn't solving the problem.

My thoughts are that the amount of data isn't that huge to cause the resources to run out, but I've tried optimizing the script in several ways and can't get it to work. The only time I get it to work is by running it on a small portion of the records like I mention above.

I would like to know if there's something script-wise or php-configuration-wise I can do to fix this. I can't change the database tables with the information by the way.

Code

This is just the relevant bits of code that I think matter, the script is longer:

       $this->Usuario->bindModel(
            array('hasMany' => array(
                'UsuarioProgreso' => array('className' => 'UsuarioProgreso', 'foreignKey' => 'id_usuario', 'conditions' => array('UsuarioProgreso.id_campania' => $id_campania)))
                ));
    
       $usuarios = $this->Usuario->find('all', array(
            'conditions'=>array('Usuario.id_campania'=>$id_campania, 'Usuario.fecha_registro >'=>'2020-05-28'),
            'fields'=>array('Usuario.id_usuario', 'Usuario.login', 'Usuario.nombre', 'Usuario.apellido', 'Usuario.provincia', 'Usuario.telefono', 'Usuario.codigo_promocion'),
            'order'=>array('Usuario.login ASC')
        ));

        $usuario = null;
        $progreso_usuario = null;
        $datos_progreso = null;
        $i = 2;

        
        foreach ($usuarios as $usuario) {        
    
                if (isset($usuario['UsuarioProgreso']['datos_progreso'])) {
    
                    $datos_progreso = unserialize($progreso['UsuarioProgreso']['datos_progreso']);
    
                    $unit = 1;
                    $column = 'G';
    
                    while ($unit <= 60) {
    
                        if (isset($datos_progreso[$unit]['punt']))
                            $puntuacion = $datos_progreso[$unit]['punt'];
                        else
                            $puntuacion = ' ';
                        
                        $objSheet->getCell($column.$i)->setValue($puntuacion);
    
                        $column++;
                        $unit++;
                    }
    
                    $nivel = 1;
                    $unidad_nivel = array(1 => 64, 2 => 68, 3 => 72, 4 => 76, 5 => 80, 6 => 84);
    
                    while ($nivel <= 6) {
    
                        $unidad = $unidad_nivel[$nivel]; 
    
                        if (isset($datos_progreso[$unidad]['punt']))
                            $puntuacion = $datos_progreso[$unidad]['punt'];
                        else
                            $puntuacion = ' ';
                        
                        $objSheet->getCell($column.$i)->setValue($puntuacion);
    
                        $column++;
                        $nivel++;
                    }
                }
    
                //Free the variables
                $usuario = null;
                $progreso_usuario = null;
                $datos_progreso = null;

            $i++;
        }     

What I have tried

I have tried not using bindModel, and instead just load the information of both tables separately. So loading all the info of users first, looping through it, and on each loop grab the info for that specific user from Table 2.

I have tried also something similar to the above, but instead of loading all the info at once for the users from Table 1, just load first all their IDs, and then loop through those IDs to grab the info from Table 1 and Table 2. I figured this way I would use less memory.

I have also tried not using CakePHP's find(), and instead use fetchAll() with "manual" queries, since after some research it seemed like it would be more efficient memory-wise (didn't seem to make a difference)


If there's any other info I can provide that can help understand better what's going on please let me know :)


EDIT:

Following the suggestions in the comments I've implemented this in a shell script and it works fine (takes a while but it completes without issue).

With that said, I would still like to make this work from a web interface. In order to figure out what's going on, and since the error_logs aren't really showing anything relevant, I've decided to do some performance testing myself.

After that testing, these are my findings:

  • It's not a memory issue since the script is using at most around 300 MB and I've given it a memory_limit of 8GB
  • The memory usage is very similar whether it's via web call or shell script
  • It's not a timeout issue since I've given the script 20 minutes limit and it crashes way before that

What other setting could be limiting this/running out that doesn't fail when it's a shell script?


Solution

  • The way I solved this was using a shell script by following the advice from the comments. I've understood that my originally intended approach was not the correct one, and while I have not been able to figure out what exactly was causing the error, it's clear that using a web script was the root of the problem.