Search code examples
phpcsvmultidimensional-arraymergeleft-join

Implement left join with PHP between two csv files


As this solution was adapted from a well-voted answer elsewhere, I was not expecting to encounter problems.

Problem: I want to LEFT JOIN file0.csv with file1.csv.

file0.csv

+-----------------+-------------+--------------+
| Manufacturer ID |    image    | description  |
+-----------------+-------------+--------------+
| SKU231          | image1.jpg  | A box.       |
| SKUAG1          | image22.jpg | Another box. |
| SKU21D          | image7a.png | A third box. |
+-----------------+-------------+--------------+

file1.csv:

+--------+--------+--------+-------+-------+
|  mpn   | length | height | width | title |
+--------+--------+--------+-------+-------+
| SKU231 |     22 |     14 |    10 | Box 1 |
| SKUAG1 |     12 |      6 |     6 | Box 2 |
| SKU21D |      5 |      8 |     5 | Box 3 |
+--------+--------+--------+-------+-------+

Desired result (file2.csv):

+-----------------+-------------+--------------+--------+--------+--------+-------+-------+
| Manufacturer ID |    image    | description  |  mpn   | length | height | width | title |
+-----------------+-------------+--------------+--------+--------+--------+-------+-------+
| SKU231          | image1.jpg  | A box.       | SKU231 |     22 |     14 |    10 | Box 1 |
| SKUAG1          | image22.jpg | Another box. | SKUAG1 |     12 |      6 |     6 | Box 2 |
| SKU21D          | image7a.png | A third box. | SKU21D |      5 |      8 |     5 | Box 3 |
+-----------------+-------------+--------------+--------+--------+--------+-------+-------+

PHP function to LEFT JOIN file0.csv with file1.csv on Manufacturer ID and mpn respectively:

function my_csv_join(array $csv_input_file_array, $csv_output_file, $html_preview, $left_join_on, $right_join_on = NULL) {
    if (count($csv_input_file_array) > 2) {
        echo 'This function probably only works for 2 csv files being joined at a time.  Reapply iteratively if needed.  Exiting now.';
        exit;
    } else {
        for ($x = 0; $x <= 1; $x++) {
            //get csv file to array 
            ${'file' . $x} = fopen($csv_input_file_array[$x], "r"); //Dynamic variables using braces: https://stackoverflow.com/a/9257536/9095603
    
            while (!feof(${'file' . $x})) {
                ${'csv_array' . $x}[] = fgetcsv(${'file' . $x});
            }
            fclose(${'file' . $x});
    
            ${'csv_array' . $x} = array_filter(${'csv_array' . $x}); // gets rid of last empty array in case present
            //CREATE HEADERED ARRAY
            ${'header' . $x} = array_shift(${'csv_array' . $x});
            foreach (${'csv_array' . $x} as ${'product_data_array' . $x} {
                ${'headered_array' . $x}[] = array_combine(${'header' . $x}, ${'product_data_array' . $x});
            }        
        }
        // How to simulate the SQL LEFT JOIN operation using PHP arrays?  https://stackoverflow.com/a/25837426

        //function to simulate the left join
        $left = $headered_array0;
        $right = $headered_array1;
        $final = array();

        if (empty($right_join_on)) // if $right_join_on omitted implies $right_join_on has the same value as $left_join_on
            $right_join_on = $left_join_on;
        foreach ($left AS $k => $v) {
            $final[$k] = $v; //basically keeping everything; $left just becomes $final
            foreach ($right AS $kk => $vv) {
                if ($v[$left_join_on] == $vv[$right_join_on]) { 
                    foreach ($vv AS $key => $val)
                        $final[$k][$key] = $val; 
                } else {
                    foreach ($vv AS $key => $val)
                   $final[$k][$key] = NULL; 
                }
            }
        }
        if ($html_preview == 'y') {
            echo '<pre>';
                var_dump($final);
            echo '</pre>';
        }
    
        // REINSTATE HEADERS
        // var_dump($final[0]);
        $indented_header = array(
            0 => array_keys($final[0])
        );
        $re_headered_array = array_merge($indented_header, $final);
    
        // write array to csv file
        $file2 = fopen($csv_output_file, "w");
        foreach ($re_headered_array as $line) {
            fputcsv($file2, $line);
        }
    
        fclose($file2);
    }
}

Call to function my_csv_join():

my_csv_join(array('C:\xampp\htdocs\kalugi\file0.csv','C:\xampp\htdocs\kalugi\file1.csv'), 'C:\xampp\htdocs\kalugi\file2.csv','y','Manufacturer ID','mpn');

Actual result shows not all records matching on the file0."Manufacturer ID" = file1.mpn are being matched. Thus some expected rows that satisfied the join condition were not joined. We have NULL in place of their values:

var_dump of result:

array(3) {
  [0]=>
  array(8) {
    ["Manufacturer ID"]=>
    string(6) "SKU231"
    ["image"]=>
    string(10) "image1.jpg"
    ["description"]=>
    string(6) "A box."
    ["mpn"]=>
    NULL
    ["length"]=>
    NULL
    ["height"]=>
    NULL
    ["width"]=>
    NULL
    ["title"]=>
    NULL
  }
  [1]=>
  array(8) {
    ["Manufacturer ID"]=>
    string(6) "SKUAG1"
    ["image"]=>
    string(11) "image22.jpg"
    ["description"]=>
    string(12) "Another box."
    ["mpn"]=>
    NULL
    ["length"]=>
    NULL
    ["height"]=>
    NULL
    ["width"]=>
    NULL
    ["title"]=>
    NULL
  }
  [2]=>
  array(8) {
    ["Manufacturer ID"]=>
    string(6) "SKU21D"
    ["image"]=>
    string(11) "image7a.png"
    ["description"]=>
    string(12) "A third box."
    ["mpn"]=>
    string(6) "SKU21D"
    ["length"]=>
    string(1) "5"
    ["height"]=>
    string(1) "8"
    ["width"]=>
    string(1) "5"
    ["title"]=>
    string(5) "Box 3"
  }
}

Result as written out to $file2 (file2.csv):

    +-----------------+-------------+--------------+--------+--------+--------+-------+-------+
    | Manufacturer ID |    image    | description  |  mpn   | length | height | width | title |
    +-----------------+-------------+--------------+--------+--------+--------+-------+-------+
    | SKU231          | image1.jpg  | A box.       |        |        |        |       |       |
    | SKUAG1          | image22.jpg | Another box. |        |        |        |       |       |
    | SKU21D          | image7a.png | A third box. | SKU21D |      5 |      8 |     5 | Box 3 |
    +-----------------+-------------+--------------+--------+--------+--------+-------+-------+

Why are 2 of the rows not joined despite file0."Manufacturer ID" = file1.mpn being satisfied?


Solution

  • I'm going to simplify this down to an SSCCE so it's easy to understand and reproduce.

    I think you're over-complicating the solution here which is probably what's causing you so many weird bugs. The solution is actually rather simple. All you need to do is take each csv file, parse it into rows and columns, and then merge each of those rows into one row to write it back out to another csv.

    This is the simplest way to do it:

    $file0 = <<<'FILE0'
    Manufacturer ID,image,description
    SKU231,image1.jpg,A box.
    SKUAG1,image22.jpg,Another box.
    SKU21D,image7a.png,A third box.
    FILE0;
    
    $file1 = <<<'FILE1'
    mpn,length,height,width,title
    SKU231,22,14,10,Box 1
    SKUAG1,12,6,6,Box 2
    SKU21D,5,8,5,Box 3
    FILE1;
    
    $csv1 = array_map('str_getcsv', explode("\n", $file0));
    $csv2 = array_map('str_getcsv', explode("\n", $file1));
    
    $fd = fopen("/tmp/test.csv", 'w');
    
    
    foreach ($csv1 as $row => $columns) {
    
        $newRow = array_merge($columns, $csv2[$row]);
        fputcsv($fd, $newRow);
    
    }
    
    $fd = fopen("/tmp/test.csv", "r");
    
    while (($line = fgets($fd)) !== false) {
        echo $line;
    }
    

    The result:

    "Manufacturer ID",image,description,mpn,length,height,width,title
    SKU231,image1.jpg,"A box.",SKU231,22,14,10,"Box 1"
    SKUAG1,image22.jpg,"Another box.",SKUAG1,12,6,6,"Box 2"
    SKU21D,image7a.png,"A third box.",SKU21D,5,8,5,"Box 3"
    

    |Manufacturer ID | image       | description  | mpn    | length | height | width | title |
    |----------------|-------------|--------------|--------|--------|--------|-------|-------|
    | SKU231         | image1.jpg  | A box.       | SKU231 | 22     | 14     | 10    | Box 1 |
    | SKUAG1         | image22.jpg | Another box. | SKUAG1 | 12     | 6      | 6     | Box 2 |
    | SKU21D         | image7a.png | A third box. | SKU21D | 5      | 8      | 5     | Box 3 |