Search code examples
phpapicsvfopen

Importing a CSV via PHP and treating duplicate lines as 2 entries


For personal stats purpose, I check the last 500 sales through an API endpoint that sends me a CSV as a return, every 30 minutes.

The CSV is made like this:

customerID Timestamp Sale Commission SignupDate
3656323 January 12 2022, 23:42 10.00 2.00 January 23 2019, 12:34
1456324 January 12 2022, 21:42 100.00 20.00 December 15 2018, 21:34
1456324 January 12 2022, 21:42 100.00 20.00 December 15 2018, 21:34
4213516 January 12 2022, 18:42 120.00 24.00 July 25 2020, 13:34

Since I call the same endpoint every 30 minutes, I get the same CSV and most of the time the same lines reappear (I might have the exact same table 2 calls in a row) As a result, I killed all duplicates by checking on the customer ID and timestamp. I thought it was okay then.

My problem is: As you can see in the example above, the customer, timestamp and amount are similar on 2 lines. This is not a mistake, its actually a double purchase. In my case, I will not take the second line because it will be treated as duplicate.I would like to have it included in my stats but I'm not sure how.

Is there any known solution to this kind of specific issue?

Actual code that I have: (data example may differ from what is in the code)

$handle = fopen('https://url.com/csv/?limit=500','r');
if($handle){
  while ( ($data = fgetcsv($handle) ) != FALSE ) {
    if($flag) { $flag = false; continue; }
    $timestamp   = date("Y-m-d H:i:s", strtotime($data[0]));
    $signup_date = date("Y-m-d H:i:s", strtotime($data[5]));
    $amount      = str_replace(".", ".", $data[2]);
    $commission  = str_replace(".", ".", $data[3]);
    $exists = money_line_exists($timestamp, $data[1]);
    if(($exists == 0) && ($amount != "0.00") && (is_numeric($data[1]))){
      add_new_money($timestamp, $data[1], $amount, $commission, $data[4], $signup_date, $data[6], 1);
    }
  }
}

And the money_line_exists() function is as below:

function money_line_exists($date, $id){
    $db = connect_db();
    $result = $db->query("SELECT * FROM money WHERE date_bill = '$date' AND user_id = '$id'");
    if($result->num_rows == 0) {
      return 0;
    } else {
      return 1 ;
    }
    $mysqli->close();
}

Thanks in advance

Note: This is a repost for clarity and formatting.


Solution

  • So is it correct to say that duplicates 'within a single response' keep, duplicates 'across responses' remove?

    Correct. duplicates within I keep, duplicates across I remove

    If that's the case, do not insert the rows until you have parsed the whole response. That way you only query your database against previously inserted rows.

      $new_lines = [];
      while ( ($data = fgetcsv($handle) ) != FALSE ) {
        // ...
        $exists = money_line_exists($timestamp, $data[1]);
        if(($exists == 0) && ($amount != "0.00") && (is_numeric($data[1]))){
          $new_lines[] = [
             $timestamp,
             $data[1],
             $amount,
             $commission,
             $data[4],
             $signup_date,
             $data[6],
             1
          ];
        }
      }
      foreach ($new_lines as $line) {
          add_new_money(...$line);
      }