Search code examples
phpsql-servercsvfgetcsv

How to improve the speed of insertion of the csv data in a database in php?


I wrote this code that allows to read a folder, depending on the file name is inserted in a different table the data of the csv file. Once the file is processed it is moved to another folder. My code works well but the second csv file is more than 80 000 lines long and it takes several hours to be integrated in my database.

How can I improve the performance of my code? I tried 'LOAD DATA LOCAL' but without success...

<?php

include("../connexion.php");

ini_set('max_execution_time', 54000);
$timestamp= date("y-m-d H:i");

$dir   = 'D:/xampp/htdocs/retail_BI/test/';
$allFiles = scandir($dir);
$dest = 'D:/xampp/htdocs/retail_BI/test/files/';

   foreach($allFiles as $file) {

        if (!in_array($file,array(".","..")))
      { 
  
        $file = $dir.$file;
        $filename = basename( $file );
        
       if ( strpos( $filename, 'BI1_' ) === 0 ) {
        if (($handle = fopen("$filename", "r")) !== false) {
            
            //To remove BOM in the fist cell
              fseek($handle, 3);
            
            while (($data = fgetcsv($handle, 9000000, ";")) !== false) {
                if (empty(array_filter($data))) {
                    echo "not good";
                continue;
                }
 
            $date = DateTime::createFromFormat('d/m/Y H:i:s A', $data[2]);
            if ($date === false) {
            break;
            }

            $date1 = $date->format('Y-m-d'); // 2020-07-07
            $date2 = $date->format('Hi A'); // 1247 AM
            
              //database entry               
              $query = "insert into dbo.Y2_Sales (storenumber, storename, date, time, TransRef, stylecode, color, size, quantity, unit_price, SalesExGST, cost, currency) 
               values('$data[0]', '$data[1]','$date1','$date2','$data[3]','$data[4]','$data[5]','$data[6]','$data[7]', '$data[8]','$data[9]','$data[10]','$data[11]')";
                   $stmt = $conn->query( $query );
                     if (!$stmt) { echo $conn->error;}                                          
                }       
            }
                fclose($handle);
                
            //Moving the file to another folder             
            if(!rename($file, $dest . $filename)) { 
                echo "error copy";
                } 
                    
           } else if ( strpos( $filename, 'BI2_' ) === 0 ) {
    
              if (($handle = fopen("$filename", "r")) !== false) {
                  
                 //To remove BOM in the fist cell
                  fseek($handle, 3);
            
            while (($data = fgetcsv($handle, 9000000, ";")) !== false) {        
                 // print_r($data);
                if (empty(array_filter($data))) {
                continue;
                }
                 //database entry    
               $query = "insert into dbo.Y2_Inventory (storenumber, stylecode, colour, size, units, timestamp) 
               values('$data[0]', '$data[1]','$data[2]','$data[3]','$data[4]', '$timestamp')";
                   $stmt = $conn->query( $query );
                   if (!$stmt) { echo $conn->error;}
                               
                        // $query = "LOAD DATA LOCAL INFILE '$filename' INTO TABLE dbo.Y2_Inventory FIELDS TERMINATED BY ';' LINES TERMINATED BY '\n' (storenumber, stylecode, colour, size, units, timestamp)";
                         // $stmt = $conn->query( $query );
            }               
         }
            fclose($handle);

            //Moving the file to another folder             
            if(!rename($file, $dest . $filename)) { 
                            echo "error copy";
                            } 

       } 
    }
  }
  
    echo "good !";

?>

Solution

  • Instead of inserting data into database for every row, try inserting in batches.

    You can always do a bulk insert, that can take n(use 1000) number of entries and insert it into the table.

    https://www.mysqltutorial.org/mysql-insert-multiple-rows/

    This will result in reduction of the DB calls, thereby reducing the overall time.

    And for 80k entries there is a possibility that you might exceed the memory limit too.

    You can overcome that using generators in php. https://medium.com/@aashish.gaba097/database-seeding-with-large-files-in-laravel-be5b2aceaa0b

    Although, this is in Laravel, but the code that reads from csv is independent (the one that uses generator) and the logic can be used here.