Search code examples
mysqldistancelevenshtein-distance

using levenshtein distance ratio to compare 2 records


I've created the mysql user function using the levenshtein distance and ratio source codes. I am comparing 2 records and based on a 75% match I want to select the record.

  1. Order comes into table paypal_ipn_orders with an ITEM title
  2. A query executes against a table itemkey to find a 75% match in a record called ITEM as well
  3. if a 75% title is match it assigns an eight digit number from table itemkey to table paypal_ipn_orders

Here is the query

      UPDATE paypal_ipn_orders
      SET sort_num = (SELECT sort_id
      FROM itemkey
      WHERE  levenshtein_ratio(itemkey.item, paypal_ipn_orders.item_name) > 75)
      WHERE packing_slip_printed = 0 
      AND LOWER(payment_status) = 'completed' 
      AND address_name <> ''
      AND shipping < 100

I have adjusted this a few times but it's failing between line 4 and 5 at the levenshtein_ratio part. If it works it says that the subquery returns more than one row. I don't know how to fix it to make it return the correct result, I just lost as to how to make this work.


Solution

  • A subquery on a SET should only return one value. If itemkey has more than one item that is 75% of item_name what do you want to do? The below will use one of the best matches:

    UPDATE paypal_ipn_orders
    SET sort_num = (SELECT sort_id
      FROM itemkey
      WHERE  levenshtein_ratio(itemkey.item, paypal_ipn_orders.item_name) > 75
      ORDER BY levenshtein_ratio(itemkey.item, paypal_ipn_orders.item_name) DESC
      LIMIT 1)
    WHERE packing_slip_printed = 0 
      AND LOWER(payment_status) = 'completed' 
      AND address_name <> ''
      AND shipping < 100