Search code examples
dynamics-crm-2011dynamics-crm

Dynamics CRM 2011 Import Data Duplication Rules


I have a requirement in which I need to import data from excel (CSV) to Dynamics CRM regularly.

Instead of using some simple Data Duplication Rules, I need to implement a point system to determine whether a data is considered duplicate or not.

Let me give an example. For example these are the particular rules for Import:

  1. First Name, exact match, 10 pts
  2. Last Name, exact match, 15 pts
  3. Email, exact match, 20 pts
  4. Mobile Phone, exact match, 5 pts

And then the Threshold value => 19 pts

Now, if a record have First Name and Last Name matched with an old record in the entity, the points will be 25 pts, which is higher than the threshold (19 pts), therefore the data is considered as Duplicate

If, for example, the particular record only have same First Name and Mobile Phone, the points will be 15 pts, which is lower than the threshold and thus considered as Non-Duplicate

What is the best approach to achieve this requirement? Is it possible to utilize the default functionality of Import Data in the MS CRM? Is there any 3rd party Add-on that answer my requirement above?

Thank you for all the help.

Updated

Hi Konrad, thank you for your suggestions, let me elaborate here:

Excel. You could filter out the data using Excel and then, once you've obtained a unique list, import it.

Nice one but I don't think it is really workable in my case, the data will be coming regularly from client in moderate numbers (hundreds to thousands). Typically client won't check about the duplication on the data.

Workflow. Run a process removing any instance calculated as a duplicate.

Workflow is a good idea, however since it is being processed asynchronously, my concern is the user in some cases may already do some update/changes to the data inserted, before the workflow finish working.. therefore creating some data inconsistency or at the very least confusing user experience

Plugin. On every creation of a new record, you'd check if it's to be regarded as duplicate-ish and cancel it's creation (or mark for removal).

I like this approach. So I just import like usual (for example, to contact entity), but I already have a plugin in place that getting triggered every time a record is created, the plugin will check whether the record is duplicat-ish or not and took necessary action.


Solution

  • I can think of the following approaches to the task (depending on the number of records, repetitiveness of the import, automatization requirement etc.) they may be all good somehow. Would you care to elaborate on the current conditions?

    1. Excel. You could filter out the data using Excel and then, once you've obtained a unique list, import it.

    2. Plugin. On every creation of a new record, you'd check if it's to be regarded as duplicate-ish and cancel it's creation (or mark for removal).

    3. Workflow. Run a process removing any instance calculated as a duplicate.

    You also need to consider the implication of such elimination of data. There's a mathematical issue. Suppose that the uniqueness' radius (i.e. the threshold in this 1D case) is 3. Consider the following set of numbers (it's listed twice, just in different order).

    1 3 5 7    ->    1 _ 5 _
    3 1 5 7    ->    _ 3 _ 7
    

    Are you sure that's the intended result? Under some circumstances, you can even end up with sets of records of different sizes (only depending on the order). I'm a bit curious on why and how the setup came up.

    Personally, I'd go with plugin, if the above is OK by you. If you need to make sure that some of the unique-ish elements never get omitted, you'd probably best of applying a test algorithm to a backup of the data. However, that may defeat it's purpose.

    In fact, it sounds so interesting that I might create the solution for you (just to show it can be done) and blog about it. What's the dead-line?