Search code examples
phpwordpressbatch-processingbulk-load

How to avoid PHP/WordPress memory fatal error for bulk processing of data in large file uploads?


I have a large CSV file which I am uploading to the WordPress dashboard for importing taxonomy terms. I wrote a small plugin which uses wp_insert_term() function to insert each term, however, the function caches a lot of its data in order to check the slugs uniqueness and parent term dependencies, the process runs out of memory around 1000 terms, despite increasing memory allocation to 0.5 Gb.

I have been wanting to split the file into manageable chunks so as to batch process the data and run sessions limited to a 1000 or lines of data, this way each process would terminate cleanly.

I have been looking around for such a solution and found this interesting article about a similar issue faced by bulk image imports, and it outlines how the developers used javascript to control the batch process by sending ajax requests to the server and manageable chunks.

It gave me the idea of reading the CSV file on upload, reading it line by line and sending an ajax request to the server to process a manageable number of lines.

Is ther a better way to achieve this?


Solution

  • I developed the following solution based on the links in the question and some additional tinkering.

    On the WordPress server-side, when loading the javascript file, I determine the number of lines the server can handle based on the memory allocation using,

    $limit = ini_get('memory_limit');
    $limit = wp_convert_hr_to_bytes($limit) / MB_IN_BYTES; //in MBs.
    switch(true){
        case $limit >= 512:
            $limit = 1000;
            break;
        default:
            $limit = 500;
            break;
    }
    wp_enqueue_script( 'my-javascript-file');
    wp_localize_script( 'my-javascript-file', 'cirData', array(
        'limit'=>$limit
    ));
    

    you should determine and set your own limit as per your process.

    In the javascript file, using jQuery,

    var reader,formData, lineMarker=0, csvLines, isEOF=false, $file, $form ;
      $(document).ready(function(){
        $file = $(':file'); //file input field
        $form = $('form');  //form
        //when the file field changes....
        $file.on('change', function(){
          //check if the file field has a value.
          if($file.val()){
            //setup file reader.
            reader = new FileReader();
            //now listen for when the file is ready to be read.
            reader.addEventListener('load', function (e) {
              csvLines = e.target.result.split("\n");
              batchProcess(); //launch process.
            });
            //when the form is being submitted, start reading the file.
            $(document).on('click', ':submit', function(e){
              e.preventDefault(); //disable normal submit.
              //setup data for the ajax.
              formData = new FormData($form.get(0));
    
              //read the file and batch request to server.
              reader.readAsBinaryString($file.get(0).files[0]);
            })
          }
        })
      });
    
      // Methods
      //posting
      function postCSVdata(csvdata){
        formData.set('csvlines', csvdata); //set the current datat to send.
        $.ajax({
          type: 'POST',
          url: $form.attr('action'),
          data: formData,
          contentType: false,
          processData: false,
          cache: false,
          success: function(data){
            var msg ="";
            if(isEOF){ //is this end of the file?
              console.log("success!");
            }else{ //continue reading file.
              console.log("uploaded:"+ Math.round(lineMarker/csvLines.length*100)+"%");
              batchProcess(); //process the next part of the file.
            }
          }
        })
      }
      //batch process.
      function batchProcess(){
        //csvlines is the array containing all the lines read from the file.
        //lineMarker is the index of the last line read.
        var parsedata='', stop = csvLines.length - lineMarker, line='';
    
        for(var i = 0; i < stop; i++) {
          line = csvLines[i+lineMarker];
          parsedata +=line+"\n"; //add a new line char for server to process.
          //check if max limit of lines server can process is reached.
          if(i>(cirData.limit-2)) break; //batch limit.
        }
        lineMarker += i;
        if(i==stop) isEOF = true;
        postCSVdata(parsedata); //send to server.
      }
    

    this sends multiple AJAX request in a sequential manner in chunks of lines that the server is able to handle without having a fatal memory error.