Search code examples
phpvalidationtextcallback

PHP: validate the lines of a "text" file while extracting some stats at the same time?


I have a file (from a POST request) that I would like to validate against some constraints:

  • All lines must be composed of ASCII printable characters only.
  • There must be at least one XYZ record (lines that start with @XYZ ).
  • There must be at most 999999 XYZ records

For that purpose I made a generic function that reads a file by chunks and pass each line to a callback for validation:

/*
 * Iterates over each line of the file, passing them to the callback function for validation.
 * When the callback function returns false, or when there is an error,
 * the validation process ends.
 * 
 * @param string   $filename       The name of the file to validate.
 * @param callable $callback       The callback function to use for validating each line.
 * @param string   $line_delimiter The line-ending delimiter (default is "\n").
 * @param integer  $buffer_size    The maximum number of bytes to read from the file at a time (default is 8192).
 *
 * @return Returns true when $callback returned true for each line, false if not, and NULL on error.
 *
 * @warning When $buffer_size is not large enough to contain a whole line, $callback will validate chunks of lines.
 */
function validate_file_lines($filename, $callback, $line_delimiter = "\n", $buffer_size = 8192)
{
    $handle = fopen($filename, 'rb');
    $is_valid = (false === $handle ? null : true);

    $remainder = '';

    while ( $is_valid && !feof($handle) )
    {
        $buffer = fread($handle, $buffer_size);

        if ( false === $buffer )
        {
            $is_valid = null;
        }
        else
        {
            $lines_array = explode($line_delimiter, $buffer);
            $lines_array_key_last = count($lines_array) - 1;

            $lines_array[0] = $remainder . $lines_array[0];

            if ( $lines_array_key_last !== 0 )
            {
                $remainder = $lines_array[$lines_array_key_last];
                unset($lines_array[$lines_array_key_last]);
            }

            foreach ( $lines_array as $line )
            {
                $is_valid = $callback($line);
                if ( ! $is_valid )
                    break;
            }
        }
    }
    @fclose($handle);
    return $is_valid;
}

Now, using it, I'm trying to validate a file, for example:

HEAD good
@XYZ 1
@XYZ 1
%END

HEAD better
@XYZ 2 2
%END
$xyz_count = 0;
$xyz_min = 1;
$xyz_max = 999999;

$is_valid_line = function($line) use(&$xyz_count, $xyz_max) {
    $is_valid = true;
    if ( ctype_print($line) )
    {
        if ( substr($line, 0, 6) === '@XYZ ' )
        {
            ++$xyz_count;
            $is_valid = $xyz_count <= $xyz_max;
        }
    }
    else if ( '' !== @$line[0] )
    {
        $is_valid = false;
    }
    return $is_valid;
};

var_dump(
    validate_file_lines('file.txt', $is_valid_line) && $xyz_count >= $xyz_min
);

The current output is:

bool(false)

While I'm expecting:

bool(true)

What am I doing wrong?


ASIDE

Does the SPL provide any class for iterating over file lines?


Solution

  • Your substr() needs to be 5 chars, not 6. You can use fgets() to read by line. Here's a barebones solution that might probably work. And your mode should just be r

    Also, you might add debug printing to show where errors are happening.

    <?php
    $fh = fopen($filename, 'r');
    $valid = true;
    $xyz_count = 0;
    while ($valid && $line = fgets($fh)){
        if (!ctype_print($line))$valid = false;
        if (substr($line, 0, 5) == '@XYZ ')$xyz_count++;
        if ($xyz_count >= $xyz_max)$valid = false;
    
        // if (!$valid)echo "LINE (fail): {$line}";
    
    }
    if ($xyz_count === 0)$valid = false;
    fclose($fh);