Search code examples
phparraysreadfilechunks

Cannot populate an array by reading 500,000 entries from a file for memory reasons


I am parsing about 500.000 entries into an array $properties:

$properties = array();
$handle = fopen($file_path, "r");
if ($handle) {
    while (($str = fgets($handle)) !== false) {
        if (strlen($str) && $str[0] == '#') {
            $pdate = substr($str, 1);
            $date = rtrim($pdate);
            $formatted = DateTime::createFromFormat('* M d H:i:s T Y', $date);
        }
        $str = rtrim ($str, "\n");
        $exp = explode ('=', $str);
        if (count($exp) == 2){
            $exp2 = explode('.', $exp[0]);  
            if (count($exp2) == 2) {
                if ($exp2[1] == "dateTime") {
                    $s = str_replace("\\", "", $exp[1]);
                    $d = strtotime($s);
                    $dateTime = date('Y-m-d H:i:s', $d);
                    $properties[$exp2[0]][$exp2[1]] = $dateTime;
                } else {
                    $properties[$exp2[0]][$exp2[1]] = $exp[1];
                }
            } else {
                $properties[$exp[0]] = $exp[1];
            }
        } 
    }
    fclose($handle);
} else {
    echo "error";
}

This is working well so far, but I need to split the array into chunks, because otherwise the array is too big to work with.

$properties_chunk = array_chunk($properties, 10000, true);

But now I have the problem that the $properties_chunk array is not created. The system crashes. This is too much. But what can I do now?

The array should look like this in the end:

array(4) {
  [0]=>
  array(10000) {
    ["12345"]=>
    array(5) {
      ["dateTime"]=>
      string(19) "2016-10-12 19:46:25"
      ["fileName"]=>
      string(46) "monkey.jpg"
      ["path"]=>
      string(149) "Volumes/animals/monkey.jpg"
      ["size"]=>
      string(7) "2650752"
    }
    ["678790"]=>
    array(5) {
      ["dateTime"]=>
      string(19) "2016-10-12 14:39:43"
      ["fileName"]=>
      string(45) "elephant.jpg"
      ["path"]=>
      string(171) "Volumes/animals/elephant.jpg"
      ["size"]=>
      string(7) "2306688"
    }

... and so on.

Solution

  • If you use array_splice the items will be moved from the input array to the resulting array.

    This should mean the memory use should stay the same-ish.

    This will do the same as array_chunk but hopefully be less memory hungry.

    $arr = [1,2,3,4,5,6,7,8,9,10];
    $n = 10000;
    $count = (count($arr)/$n)-1; // do not splice the last items in loop
    For($i=0; $i<$count; $i++){
        $res[] = array_splice($arr, 0,$n);
    }
    $res[] = array_splice($arr, 0,count($arr)); 
    // here we splice the last items from arr to $res. 
    // Notice how we splice count($arr) instead of $n. 
    // If count($arr) == $n we could have done it in the loop. 
    // But if we assume they are not, array_splice in the loop will create empty items. This line will not.
    
    Var_dump($res, $arr); // $res has all values, $arr is empty
    

    https://3v4l.org/XDpdI