Search code examples
phpmysqllarge-data-volumes

Odd behavior when fetching 100K rows from MySQL via PHP


Looking for some ideas here... I have a MySQL table that has 100K rows of test data in it.

I am using a PHP script to fetch rows out of that table, and in this test case, the script is fetching all 100,000 rows (doing some profiling and optimization for large datasets).

I connect to the DB, and execute an unbuffered query:

$result = mysql_unbuffered_query("SELECT * FROM TestTable", $connection) or die('Errant query:  ' . $query);

Then I iterate over the results with:

if ($result) {
    while($tweet = mysql_fetch_assoc($result)) {
        $ctr++;

        if ($ctr > $kMAX_RECORDS) {
            $masterCount += $ctr;
            processResults($results);
            $results = array();
            $ctr = 1;
        }

        $results[] = array('tweet' => $tweet);
    }

    echo "<p/>FINISHED GATHERING RESULTS";
}

function processResults($resultSet) {
    echo "<br/>PROCESSED " . count($resultSet) . " RECORDS";
}

$kMAX_RECORDS = 40000 right now, so I would expect to see output like:

PROCESSED 40000 RECORDS
PROCESSED 40000 RECORDS
PROCESSED 20000 RECORDS

FINISHED GATHERING RESULTS

However, I am consistently seeing:

PROCESSED 39999 RECORDS
PROCESSED 40000 RECORDS

FINISHED GATHERING RESULTS

If I add the output of $ctr right after $ctr++, I get the full 100K records, so it seems to me to be some sort of timing issue or problem with fetching the data from the back-end with MYSQL_FETCH_ASSOC.

On a related note, the code in the while loop is there because prior to breaking up the $results array like this the while loop would just fall over at around 45000 records (same place every time). Is this due to a setting somewhere that I have missed?

Thanks for any input... just need some thoughts on where to look for the answer to this.

Cheers!


Solution

  • When you start going through your result $ctr has no value and doing first incrementation will evaluate it to 0. But when reaching $kMAX_RECORDS you reset it to 1 instead of 0. I don't know however why you see 1 row less in the first time of calling processResults(). I think it should be one more.

    As to missing last 20000 rows notice that you are running processResults() only after $ctr exceeds $kMAX_RECORDS