Search code examples
phpmultidimensional-arrayduplicatesfilteringunique

Filter/Remove rows where column value is found more than once in a multidimensional array


I need to remove rows from my input array where duplicate values occur in a specific column.

Sample array:

$array = [
    ['user_id' => 82, 'ac_type' => 1],
    ['user_id' => 80, 'ac_type' => 5],
    ['user_id' => 76, 'ac_type' => 1],
    ['user_id' => 82, 'ac_type' => 1],
    ['user_id' => 80, 'ac_type' => 5]
];

I'd like to filter by user_id to ensure uniqueness and achieve this result:

So, my output will be like this:

[
    ['user_id' => 82, 'ac_type' => 1],
    ['user_id' => 80, 'ac_type' => 5],
    ['user_id' => 76, 'ac_type' => 1]
]

I've already found this page, but none of the answers work for my situation:

$result = array_unique($array, SORT_REGULAR);

and

$result = array_map("unserialize", array_unique(array_map("serialize", $array)));

and

$result = array();
foreach ($array as $k => $v) {
    $results[implode($v)] = $v;
}
$results = array_values($results);
print_r($results);

but duplicate rows still exist.


Solution

  • For a clearer "minimal, complete, verifiable example", I'll use the following input array in my demos:

    $array = [
        ['user_id' => 82, 'ac_type' => 1],
        ['user_id' => 80, 'ac_type' => 5],
        ['user_id' => 76, 'ac_type' => 1],
        ['user_id' => 82, 'ac_type' => 2],
        ['user_id' => 80, 'ac_type' => 5]
    ];
    // elements [0] and [3] have the same user_id, but different ac_type
    // elements [1] and [4] have identical row data
    
    1. Unconditionally push rows into a result array and assign associative first-level keys, then re-index with array_values(). This approach overwrites earlier duplicate rows with later occurring ones.

      array_column demo:

      var_export(array_values(array_column($array, null, 'user_id')));
      

      foreach demo:

      $result = [];
      foreach ($array as $row) {
          $result[$row['user_id']] = $row;
      }
      var_export(array_values($result));
      

      Output:

      [
          ['user_id' => 82, 'ac_type' => 2], // was input row [3]
          ['user_id' => 80, 'ac_type' => 5], // was input row [4]
          ['user_id' => 76, 'ac_type' => 1]  // was input row [2]
      ]
      
    2. Use a condition or the null coalescing assignment operator to preserve the first occurring row while removing duplicates.

      foreach null coalescing assignment demo:

      foreach ($array as $a) {
          $result[$a['user_id']] ??= $a; // only store if first occurrence of user_id
      }
      var_export(array_values($result)); // re-index and print
      

      foreach isset demo:

      foreach ($array as $a) {
          if (!isset($result[$a['user_id']])) {
              $result[$a['user_id']] = $a; // only store if first occurrence of user_id
          }
      }
      var_export(array_values($result)); // re-index and print
      

      Output:

      [
          ['user_id' => 82, 'ac_type' => 1], // was input row [0]
          ['user_id' => 80, 'ac_type' => 5], // was input row [1]
          ['user_id' => 76, 'ac_type' => 1]  // was input row [2]
      ]
      
    3. It is also possible to unconditionally push data AND avoid a condition, but the row order may differ between the input and output (if it matters to you).

      array_reverse, array_column demo:

      var_export(array_values(array_column(array_reverse($array), null, 'user_id')));
      

      array_reduce demo:

      var_export(
          array_values(
              array_reduce(
                  $array,
                  fn($res, $row) => array_replace([$row['user_id'] => $row], $res),
                  []
              )
          )
      );
      

      foreach array_reverse demo:

      $result = [];
      foreach (array_reverse($array) as $row) {
          $result[$row['user_id']] = $row;
      }
      var_export(array_values($result));
      

      Output:

      [
          ['user_id' => 80, 'ac_type' => 5], // was input row [1]
          ['user_id' => 82, 'ac_type' => 1], // was input row [0]
          ['user_id' => 76, 'ac_type' => 1]  // was input row [2]
      ]
      

    A warning about a fringe case not expressed in this example: if you are using row values as identifiers that may be corrupted upon being used as keys, the above techniques will give unreliable results. For instance, PHP does not allow float values as keys (they will cause an error or be truncated, depending on your PHP version). Only in these fringe cases might you consider using inefficient, iterated calls of in_array() to evaluate uniqueness.


    Using array_unique(..., SORT_REGULAR) is only suitable when determining uniqueness by ENTIRE rows of data.

    array_unique demo:

    var_export(array_unique($array, SORT_REGULAR));
    

    Output:

    [
        ['user_id' => 82, 'ac_type' => 1], // was input row [0]
        ['user_id' => 80, 'ac_type' => 5], // was input row [1]
        ['user_id' => 76, 'ac_type' => 1]  // was input row [2]
        ['user_id' => 82, 'ac_type' => 2], // was input row [3]
    ]
    

    As a slight extension of requirements, if uniqueness must be determined based on more than one column, but not all columns, then use a "composite key" composed of the meaningful column values. The following uses the null coalescing assignment operator, but the other techniques from #2 and #3 can also be implemented.

    Code: (Demo)

    foreach ($array as $row) {
        $compositeKey = $row['user_id'] . '_' . $row['ac_type'];
        $result[$compositeKey] ??= $row;      // only store if first occurrence of compositeKey
    }
    

    Though I have never used it, the Ouzo Goodies library seems to have a uniqueBy() method that is relevant to this topic. See the unexplained snippet here.