Assume we're doing some kind of scrape and in the end we could get both duplicate and semi-duplicate results.
Given an input array that might look a bit like this:
$inputArr = [
[
'title' => 'Test0',
'desc' => 'Short Desc',
],
[
'title' => 'Test5',
'desc' => 'Short Desc',
],
[
'title' => 'Test0',
'desc' => 'Much Longer Than Short Desc',
],
[
'title' => 'Test0.5',
'desc' => 'Short Desc',
],
[
'title' => 'Test1',
'desc' => 'Short Desc',
],
[
'title' => 'Test1',
'desc' => 'Much Longer Than Short Desc',
],
[
'title' => 'Test1.5',
'desc' => 'Much Longer Than Short Desc',
],
[
'title' => 'Test3',
'desc' => 'Short Desc',
],
[
'title' => 'Test2',
'desc' => 'Short Desc',
],
[
'title' => 'Test3.75',
'desc' => 'Much Longer Than Short Desc',
],
[
'title' => 'Test3.25',
'desc' => 'Short Desc',
],
[
'title' => 'Test2',
'desc' => 'Much Longer Than Short Desc',
],
[
'title' => 'Test3',
'desc' => 'Much Longer Than Short Desc',
],
[
'title' => 'Test5',
'desc' => 'Much Longer Than Short Desc',
],
[
'title' => 'Test3.5',
'desc' => 'Short Desc',
],
[
'title' => 'Test4',
'desc' => 'Short Desc',
],
[
'title' => 'Test5',
'desc' => 'Much Longer Than Short Desc',
],
[
'title' => 'Test4.5',
'desc' => 'Short Desc',
],
[
'title' => 'Test4',
'desc' => 'Much Longer Than Short Desc',
],
[
'title' => 'Test5',
'desc' => 'Much Longer Than Short Desc',
],
];
The resulting array must contain only arrays with ONE instance of title
value where the desc
is the longest string value while removing all but one where desc
has an equal string length value to others.
e.g. the final output should look like:
$resultArr = [
[
'title' => 'Test0',
'desc' => 'Much Longer Than Short Desc',
],
[
'title' => 'Test0.5',
'desc' => 'Short Desc',
],
[
'title' => 'Test1',
'desc' => 'Much Longer Than Short Desc',
],
[
'title' => 'Test1.5',
'desc' => 'Much Longer Than Short Desc',
],
[
'title' => 'Test2',
'desc' => 'Much Longer Than Short Desc',
],
[
'title' => 'Test3',
'desc' => 'Much Longer Than Short Desc',
],
[
'title' => 'Test3.25',
'desc' => 'Short Desc',
],
[
'title' => 'Test3.5',
'desc' => 'Short Desc',
],
[
'title' => 'Test3.75',
'desc' => 'Much Longer Than Short Desc',
],
[
'title' => 'Test4',
'desc' => 'Much Longer Than Short Desc',
],
[
'title' => 'Test4.5',
'desc' => 'Short Desc',
],
[
'title' => 'Test5',
'desc' => 'Much Longer Than Short Desc',
],
];
I've tried several different solutions and I don't like any of them. No matter how I come at it it feels like a kludge and I feel like I'm missing an obvious and elegant solution.
I know someone will have a good suggestion for something cleaner than the sorting, looping, and filtering I've tried.
You can do it like this:
foreach($inputArr as $item) {
if ( isset($result[$item['title']]) && strlen($result[$item['title']]['desc']) > strlen($item['desc']) )
continue;
$result[$item['title']] = $item;
}
$result = array_values($result);
print_r($result);
You build a new associative array using the title as key. You loop the original array, when the key exists, you check if the length of the desc is longer and you continue otherwise, you replace the item in the result array with the current item.
You can also use array_reduce
:
$result = array_reduce($inputArr, function ($c, $i) {
if ( !isset($c[$i['title']]) || strlen($c[$i['title']]['desc']) < strlen($i['desc']) )
$c[$i['title']] = $i;
return $c;
});
$result = array_values($result);
print_r($result);