Search code examples
phpfilememory-managementpseudocode

Break A Large File Into Many Smaller Files With PHP


I have a 209MB .txt file with about 95,000 lines that is automatically pushed to my server once a week to update some content on my website. The problem is I cannot allocate enough memory to process such a large file, so I want to break the large file into smaller files with 5,000 lines each.

I cannot use file() at all until the file is broken into smaller pieces, so I have been working with SplFileObject. But I have gotten nowhere with it. Here's some pseudocode of what I want to accomplish:

read the file contents

while there are still lines left to be read in the file
    create a new file
    write the next 5000 lines to this file
    close this file

for each file created
    run mysql update queries with the new content

delete all of the files that were created

The file is in csv format.

EDIT: Here is the solution for reading the file by line given the answers below:

function getLine($number) {
    global $handle, $index;
    $offset = $index[$number];
    fseek($handle, $offset);
    return explode("|",fgets($handle));
}

$handle = @fopen("content.txt", "r");

while (false !== ($line = fgets($handle))) {
    $index[] = ftell($handle);
}

print_r(getLine(18437));

fclose($handle);

Solution

  • If your big file is in CSV format, I guess that you need to process it line by line and don't actually need to break it into smaller files. There should be no need to hold 5.000 or more lines in memory at once! To do that, simply use PHP's "low-level" file functions:

    $fp = fopen("path/to/file", "r");
    
    while (false !== ($line = fgets($fp))) {
        // Process $line, e.g split it into values since it is CSV.
        $values = explode(",", $line);
    
        // Do stuff: Run MySQL updates, ...
    }
    
    fclose($fp);
    

    If you need random-access, e.g. read a line by line number, you could create a "line index" for your file:

    $fp = fopen("path/to/file", "r");
    
    $index = array(0);
    
    while (false !== ($line = fgets($fp))) {
        $index[] = ftell($fp);  // get the current byte offset
    }
    

    Now $index maps line numbers to byte offsets and you can navigate to a line by using fseek():

    function get_line($number)
    {
        global $fp, $index;
        $offset = $index[$number];
        fseek($fp, $offset);
        return fgets($fp);
    }
    
    $line10 = get_line(10);
    
    // ... Once you are done:
    fclose($fp);
    

    Note that I started line counting at 0, unlike text editors.