Search code examples
databaseuser-input

How can I standardize user-entered data?


I have a table of data that I'm trying to "standardize". The data entered into the table wasn't static or standardized (like with drop-down lists of answers), leaving me with multiple variations of answers where I want a static, universal answer.

For instance, let's say that there's a column in the database called "Type of pet". Because user input wasn't standardized, people could enter in variations of a specific type of pet, rather than generalized form of the pet. So instead of just entering "Dog", there are different versions of dogs like "Collie", "Mutt", "Labrador", etc.

How do I go about transcribing these answers into their generalized form -- replacing Collie/Mutt/Labrador/etc answers in the table with just "Dog" (or "Cat", or "Bird", etc.)?

I realize there needs to be some form of a manually-entered "translation" function. My gut reaction is that a long-spanning list of stacked if-statements would be inefficient, as well as being tedious to control and expand.

Is there some kind of process or system for doing something like this? Like some type of lookup table system/matrix?

I'm assuming a foreach loop to iterate through the array of records would be most appropriate. And then within each iteration of the foreach loop, you'd have it do a test/comparison of the pet variable against some type of list (that I would have created manually) -- but what would you use for this lookup table/list? Or this step of the process? Would you have it as some type of a SQL database/table, an array, a CSV file, etc.?

Then, once this comparison is completed and the "translated" equivalent of the type of pet is determined, the foreach loop would update that specific row of the record, either overwriting the old non-standardized value, or perhaps just tacking on the new standardized equivalent into a new column (for later verifying).


Solution

  • My gut reaction is that a long-spanning list of stacked if-statements would be inefficient, as well as being tedious to control and expand.

    100% correct, and because of this you really only have one option: Manually go through the database and clean it up. Once that is done you will need to restrict user input using stop down lists rather than raw text input.

    Depending on your users you might want to look at how Stackoverflow does tags - essentially allowing anyone to do the cleanup for you.