Search code examples
phplaravellaravel-validation

How to Validate huge data using LazyCollection Laravel


I'm trying to validate huge amount of data using Laravel LazyCollection, I test the code with 15 thousands rows each contains 9 columns to be validated.

The scenario is user upload the excel file, then convert it to array, after that the validation of data begins

The Controller :

class ImportBudget extends Controller
{
    use SpreadsheetTrait;

    public function import(Request $request) 
    {
        // File format validation
        $validatedFile = SpreadsheetTrait::fileHandler($request);
        
        if (!is_array($validatedFile)) return $validatedFile;

        $messages = [
            'required' => 'This field is required, please input.',
            'numeric' => 'Please input number value only.',
            'integer' => 'Please input integer value only.',
            'min' => 'Minimal input value is :min.',
            'required_if' => 'This field is required, please input.',
        ];
        
        // Cars data which contain 30k++ of arrays
        $car = Cache::get('car')->pluck('slug')->toArray();

        // Start Data Validation
        $validatedData = LazyCollection::make(function () use($validatedFile) {
            $data = collect($validatedFile);
            yield $data;
        })->chunk(1000)->each(function ($rows) use ($car, $messages) {
            return Validator::make($rows->toArray(), [
                '*.0' => ['required', function ($attribute, $value, $fail) {
                    if (!in_array($value , config('constants.purposes'))) {
                        $fail('The purpose field is invalid.');
                    }
                }],
                '*.1' => 'required_if:*.0,PRODUCTION-PROJECT',
                '*.2' => 'required',
                '*.3' => 'required',
                '*.4' => 'required',
                '*.5' => 'required',
                '*.6' => 'required',
                '*.7' => ['required', function ($attribute, $value, $fail) use($car) {
                    if (!in_array($value, $curr)) {
                        $fail('The car is invalid.');
                    }
                }],
                '*.8' => 'required|numeric|min:0',
                '*.9' => 'required|integer|min:1',
            ], $messages);
        });
    }
}

The code above resulting error max executing times :

{
    "message": "Maximum execution time of 60 seconds exceeded",
    "exception": "Symfony\\Component\\ErrorHandler\\Error\\FatalError",
    "file": "C:\\laragon\\www\\bulus-jai\\vendor\\laravel\\framework\\src\\Illuminate\\Collections\\Arr.php",
    "line": 115,
    "trace": []
}

Even i add the execution time to 120, the result remain the same

Note that the $car variable contains 30k++ arrays, i think this also make this validation slower aswell, but i don't have any idea how to make it more simpler.

What is the best solution for this ?

Update 1

I tried to switch using my own validation script by create the services , and the result is pretty good (around 5 ~ 10 seconds for 15k rows) :


class BatchValidationServices {

    public static function budgetValidation($validatedFile)
    {
        $requiredFields = [
            0 => true,
            1 => false,
            2 => true,
            3 => true,
            4 => true,
            5 => true,
            6 => true,
            7 => true,
            8 => true,
            9 => true,
            10 => false
        ];

        $curr = collect(Cache::get('curr'))
            ->where('term.status','BUDGETING')
            ->pluck('name')
            ->toArray();
        $item = Cache::get('item')->pluck('item_code')->toArray();
        $car = Cache::get('car')->pluck('slug')->toArray();
        $deliveryPlan = Cache::get('delivery');
        $orderPlan = Cache::get('order');

        $collectedData = LazyCollection::make(function () use($validatedFile) {
            $data = collect($validatedFile);
            yield $data;
        }); 

        $errors = [];

        $collectedData->chunk(1000)
            ->each(function ($collection)  use (
                $orderPlan,
                $deliveryPlan,
                $car,
                $curr,
                $item,
                $requiredFields, &$errors){
                foreach ($collection->toArray() as $array) {
                    foreach ($array as $key => $row) {
    
                        // Validate blank rows
                        for ($i=0; $i < count($requiredFields); $i++) {
                            if ($row[$i] === null &&
                            $requiredFields[$i] === true) {
                                array_push($errors, [$key.'.'.$i => 'This field is required.']);
                            }
                        }
    
                        // Validate purpose validity
                        if (!in_array($row[0], config('constants.purposes'))) array_push($errors, [$key.'.0' => 'Purpose is invalid.']);
    
                        // Validate required preparation item
                        $preparationItems = array_column(config('constants.preparations'), 'item');
    
                        if ($row[0] === 'PRODUCTION-PROJECT' && $row[1] === null) {
                            array_push($errors, [$key.'.1' => 'This field is required if the purpose is PRODUCTION-PROJECT.']);  
                        } elseif ($row[0] === 'PRODUCTION-PROJECT' && $row[1] !== null) {
                            if (!in_array($row[1], $preparationItems)) array_push($errors, [$key.'.1' => 'Production preparation item is invalid.']);
                        }
    
                        // Validate order plan & delivery plan 
                        if (!in_array($row[2], $orderPlan)) array_push($errors, [$key.'.2' => 'Order plan is invalid.']);
                        
                        if (!in_array($row[3], $deliveryPlan)) {
                            array_push($errors, [$key.'.3' => 'Delivery plan is invalid.']);
                        } else {
                            if ($row[3] < $row[2]) array_push($errors, [$key.'.3' => 'Delivery plan should be after or at least in the same period as order plan.']);
                        }
                        
                        // Validate destination-carline
                        if (!in_array($row[4], $car)) array_push($errors, [$key.'.4' => 'Destination-carline is invalid.']);

                        // Validate Origin
                        if (!in_array($row[5], ['DOMESTIC', 'IMPORT'])) array_push($errors, [$key.'.5' => 'Origin supplier is invalid, please choose between DOMESTIC or IMPORT only.']);

                        // Validate Item
                        if(!in_array($row[6], $item)) array_push($errors, [$key.'.6' => 'Item code is invalid.']);

                        // Validate Currency
                        if (!in_array($row[7], $curr)) {
                            array_push($errors, [$key.'.7' => 'Currency is invalid.']);
                        } else {
                            if ($row[5] === 'IMPORT' && $row[7] === 'IDR') array_push($errors, [$key.'.7' => 'IDR currency shouldn\'t be used for IMPORT.']);
                        }

                        // Validate Price
                        if (!is_numeric($row[8])) {
                            array_push($errors, [$key.'.8' => 'Please only input numerical value.']);
                        } else {
                            if ($row[8] <= 0) array_push($errors, [$key.'.8' => 'Value must be greater than 0.']);
                        }

                        // Validate Qty
                        if (!is_integer($row[9])) {
                            array_push($errors, [$key.'.9' => 'Please only input numerical value.']);
                        } else {
                            if ($row[9] <= 0) array_push($errors, [$key.'.9' => 'Value must be greater than 0.']);
                        }
                    }
                }

                return $errors;
            });

        if (count($errors) > 0) {
            return $errors;
        } else {
            return true;
        }
    }
}

However I am still wondering why, when i use the built in Laravel validation, it's taking so long? I'd prefer using the Laravel validation because the code is more readable.


Solution

  • Since you have already loaded the entire contents of the spreadsheet into the $validatedFile variable, why make a LazyCollection object? Their only purpose is to save memory by not loading large data sets into memory. Your validation rules using closures can also be cleaned up. This isn't just a cosmetic change: in_array() is notoriously slow.

    class ImportBudget extends Controller
    {
        use SpreadsheetTrait;
    
        public function import(Request $request) 
        {
            // File format validation
            $validatedFile = SpreadsheetTrait::fileHandler($request);
            
            if (!is_array($validatedFile)) {
                // this should be throwing an exception of some kind
                return $validatedFile;
            }
    
            $purposes = config('constants.purposes');
    
            // Cars data which contain 30k++ of arrays
            $car = Cache::get('car')->pluck('slug');
    
            $rules = [
                '*.0' => ['required', Rule::in($purposes)],
                '*.1' => ['required_if:*.0,PRODUCTION-PROJECT'],
                '*.2' => ['required'],
                '*.3' => ['required'],
                '*.4' => ['required'],
                '*.5' => ['required'],
                '*.6' => ['required'],
                '*.7' => ['required', Rule::in($car)],
                '*.8' => ['required', 'numeric', 'min:0'],
                '*.9' => ['required', 'integer', 'min:1'],
            ];
    
            $messages = [
                'required' => 'This field is required, please input.',
                'numeric' => 'Please input number value only.',
                'integer' => 'Please input integer value only.',
                'min' => 'Minimal input value is :min.',
                'required_if' => 'This field is required, please input.',
            ];
    
            // Start Data Validation
            $validatedData = Validator::make($validatedFile, $rules, $messages));
        }
    }
    

    If slug is guaranteed to be unique, speed could be improved by using it as the index of the array:

    $car = Cache::get('car')->pluck('id', 'slug');
    

    Then your validation rule becomes a super quick closure that only has to check for the existence of the key:

    '*.7' => ['required', fn ($k, $v, $f) => $car[$v] ?? $f("The car in $k is invalid")],