I'm trying to validate huge amount of data using Laravel LazyCollection, I test the code with 15 thousands rows each contains 9 columns to be validated.
The scenario is user upload the excel file, then convert it to array, after that the validation of data begins
The Controller :
class ImportBudget extends Controller
{
use SpreadsheetTrait;
public function import(Request $request)
{
// File format validation
$validatedFile = SpreadsheetTrait::fileHandler($request);
if (!is_array($validatedFile)) return $validatedFile;
$messages = [
'required' => 'This field is required, please input.',
'numeric' => 'Please input number value only.',
'integer' => 'Please input integer value only.',
'min' => 'Minimal input value is :min.',
'required_if' => 'This field is required, please input.',
];
// Cars data which contain 30k++ of arrays
$car = Cache::get('car')->pluck('slug')->toArray();
// Start Data Validation
$validatedData = LazyCollection::make(function () use($validatedFile) {
$data = collect($validatedFile);
yield $data;
})->chunk(1000)->each(function ($rows) use ($car, $messages) {
return Validator::make($rows->toArray(), [
'*.0' => ['required', function ($attribute, $value, $fail) {
if (!in_array($value , config('constants.purposes'))) {
$fail('The purpose field is invalid.');
}
}],
'*.1' => 'required_if:*.0,PRODUCTION-PROJECT',
'*.2' => 'required',
'*.3' => 'required',
'*.4' => 'required',
'*.5' => 'required',
'*.6' => 'required',
'*.7' => ['required', function ($attribute, $value, $fail) use($car) {
if (!in_array($value, $curr)) {
$fail('The car is invalid.');
}
}],
'*.8' => 'required|numeric|min:0',
'*.9' => 'required|integer|min:1',
], $messages);
});
}
}
The code above resulting error max executing times :
{
"message": "Maximum execution time of 60 seconds exceeded",
"exception": "Symfony\\Component\\ErrorHandler\\Error\\FatalError",
"file": "C:\\laragon\\www\\bulus-jai\\vendor\\laravel\\framework\\src\\Illuminate\\Collections\\Arr.php",
"line": 115,
"trace": []
}
Even i add the execution time to 120, the result remain the same
Note that the $car variable contains 30k++ arrays, i think this also make this validation slower aswell, but i don't have any idea how to make it more simpler.
What is the best solution for this ?
Update 1
I tried to switch using my own validation script by create the services , and the result is pretty good (around 5 ~ 10 seconds for 15k rows) :
class BatchValidationServices {
public static function budgetValidation($validatedFile)
{
$requiredFields = [
0 => true,
1 => false,
2 => true,
3 => true,
4 => true,
5 => true,
6 => true,
7 => true,
8 => true,
9 => true,
10 => false
];
$curr = collect(Cache::get('curr'))
->where('term.status','BUDGETING')
->pluck('name')
->toArray();
$item = Cache::get('item')->pluck('item_code')->toArray();
$car = Cache::get('car')->pluck('slug')->toArray();
$deliveryPlan = Cache::get('delivery');
$orderPlan = Cache::get('order');
$collectedData = LazyCollection::make(function () use($validatedFile) {
$data = collect($validatedFile);
yield $data;
});
$errors = [];
$collectedData->chunk(1000)
->each(function ($collection) use (
$orderPlan,
$deliveryPlan,
$car,
$curr,
$item,
$requiredFields, &$errors){
foreach ($collection->toArray() as $array) {
foreach ($array as $key => $row) {
// Validate blank rows
for ($i=0; $i < count($requiredFields); $i++) {
if ($row[$i] === null &&
$requiredFields[$i] === true) {
array_push($errors, [$key.'.'.$i => 'This field is required.']);
}
}
// Validate purpose validity
if (!in_array($row[0], config('constants.purposes'))) array_push($errors, [$key.'.0' => 'Purpose is invalid.']);
// Validate required preparation item
$preparationItems = array_column(config('constants.preparations'), 'item');
if ($row[0] === 'PRODUCTION-PROJECT' && $row[1] === null) {
array_push($errors, [$key.'.1' => 'This field is required if the purpose is PRODUCTION-PROJECT.']);
} elseif ($row[0] === 'PRODUCTION-PROJECT' && $row[1] !== null) {
if (!in_array($row[1], $preparationItems)) array_push($errors, [$key.'.1' => 'Production preparation item is invalid.']);
}
// Validate order plan & delivery plan
if (!in_array($row[2], $orderPlan)) array_push($errors, [$key.'.2' => 'Order plan is invalid.']);
if (!in_array($row[3], $deliveryPlan)) {
array_push($errors, [$key.'.3' => 'Delivery plan is invalid.']);
} else {
if ($row[3] < $row[2]) array_push($errors, [$key.'.3' => 'Delivery plan should be after or at least in the same period as order plan.']);
}
// Validate destination-carline
if (!in_array($row[4], $car)) array_push($errors, [$key.'.4' => 'Destination-carline is invalid.']);
// Validate Origin
if (!in_array($row[5], ['DOMESTIC', 'IMPORT'])) array_push($errors, [$key.'.5' => 'Origin supplier is invalid, please choose between DOMESTIC or IMPORT only.']);
// Validate Item
if(!in_array($row[6], $item)) array_push($errors, [$key.'.6' => 'Item code is invalid.']);
// Validate Currency
if (!in_array($row[7], $curr)) {
array_push($errors, [$key.'.7' => 'Currency is invalid.']);
} else {
if ($row[5] === 'IMPORT' && $row[7] === 'IDR') array_push($errors, [$key.'.7' => 'IDR currency shouldn\'t be used for IMPORT.']);
}
// Validate Price
if (!is_numeric($row[8])) {
array_push($errors, [$key.'.8' => 'Please only input numerical value.']);
} else {
if ($row[8] <= 0) array_push($errors, [$key.'.8' => 'Value must be greater than 0.']);
}
// Validate Qty
if (!is_integer($row[9])) {
array_push($errors, [$key.'.9' => 'Please only input numerical value.']);
} else {
if ($row[9] <= 0) array_push($errors, [$key.'.9' => 'Value must be greater than 0.']);
}
}
}
return $errors;
});
if (count($errors) > 0) {
return $errors;
} else {
return true;
}
}
}
However I am still wondering why, when i use the built in Laravel validation, it's taking so long? I'd prefer using the Laravel validation because the code is more readable.
Since you have already loaded the entire contents of the spreadsheet into the $validatedFile
variable, why make a LazyCollection
object? Their only purpose is to save memory by not loading large data sets into memory. Your validation rules using closures can also be cleaned up. This isn't just a cosmetic change: in_array()
is notoriously slow.
class ImportBudget extends Controller
{
use SpreadsheetTrait;
public function import(Request $request)
{
// File format validation
$validatedFile = SpreadsheetTrait::fileHandler($request);
if (!is_array($validatedFile)) {
// this should be throwing an exception of some kind
return $validatedFile;
}
$purposes = config('constants.purposes');
// Cars data which contain 30k++ of arrays
$car = Cache::get('car')->pluck('slug');
$rules = [
'*.0' => ['required', Rule::in($purposes)],
'*.1' => ['required_if:*.0,PRODUCTION-PROJECT'],
'*.2' => ['required'],
'*.3' => ['required'],
'*.4' => ['required'],
'*.5' => ['required'],
'*.6' => ['required'],
'*.7' => ['required', Rule::in($car)],
'*.8' => ['required', 'numeric', 'min:0'],
'*.9' => ['required', 'integer', 'min:1'],
];
$messages = [
'required' => 'This field is required, please input.',
'numeric' => 'Please input number value only.',
'integer' => 'Please input integer value only.',
'min' => 'Minimal input value is :min.',
'required_if' => 'This field is required, please input.',
];
// Start Data Validation
$validatedData = Validator::make($validatedFile, $rules, $messages));
}
}
If slug
is guaranteed to be unique, speed could be improved by using it as the index of the array:
$car = Cache::get('car')->pluck('id', 'slug');
Then your validation rule becomes a super quick closure that only has to check for the existence of the key:
'*.7' => ['required', fn ($k, $v, $f) => $car[$v] ?? $f("The car in $k is invalid")],