Search code examples
phparrayscsvcoredump

PHP won't read full file into array, only partial


I have a file with 3,200,000 lines of csv data (with 450 columns). Total file size is 6 GB.

I read the file like this:

$data = file('csv.out');

Without fail, it only reads 897,000 lines. I confirmed with 'print_r', and echo sizeof($data). I increased my "memory_limit" to a ridiculous value like 80 GB but didn't make a difference.

Now, it DID read in my other large file, same number of lines (3,200,000) but only a few columns so total file size 1.1 GB. So it appears to be a total file size issue. FYI, 897,000 lines in the $data array is around 1.68 GB.

Update: I increased the second (longer) file to 2.1 GB (over 5 million lines) and it reads it in fine, yet truncates the other file at 1.68 GB. So does not appear to be a size issue. If I continue to increase the size of the second file to 2.2 GB, instead of truncating it and continuing the program (like it does for the first file), it dies and core dumps.

Update: I verified my system is 64 bit by printing integer and float numbers:

<?php
$large_number = 2147483647;
var_dump($large_number);                     // int(2147483647)

$large_number = 2147483648;
var_dump($large_number);                     // float(2147483648)

$million = 1000000;
$large_number =  50000 * $million;
var_dump($large_number);                     // float(50000000000)

$large_number = 9223372036854775807;
var_dump($large_number);                     //         
int(9223372036854775807)

$large_number = 9223372036854775808;
var_dump($large_number);                     //
float(9.2233720368548E+18)

$million = 1000000;
$large_number =  50000000000000 * $million;
var_dump($large_number);                     // float(5.0E+19)

print "PHP_INT_MAX: " . PHP_INT_MAX . "\n";
print "PHP_INT_SIZE: " . PHP_INT_SIZE . " bytes (" . (PHP_INT_SIZE * 8)     . "     bits)\n";

?>

The output from this script is:

int(2147483647)

int(2147483648)

int(50000000000)

int(9223372036854775807)

float(9.2233720368548E+18)

float(5.0E+19)

PHP_INT_MAX: 9223372036854775807

PHP_INT_SIZE: 8 bytes (64 bits)

So since it's 64 bit, and memory limit is set really high, why is PHP not reading files > 2.15 GB?


Solution

  • I fixed it. All I had to do was change the way I read the files. Why...I do not know.

    Old code that only reads 2.15 GB out of 6.0 GB:

    $data = file('csv.out'); 
    

    New code that reads the full 6.0 GB:

    $data = array();
    
    $i=1;
    $handle = fopen('csv.out');
    
    if ($handle) {
    while (($data[$i] = fgets($handle)) !== false){
      // process the line read
      $i++;
    }
    

    Feel free to shed some light on why. There must be some limitation when using

    $var=file();
    

    Interestingly, 2.15 GB is close to the 32 bit limit I read about.