Search code examples
pythonmysqlstring-parsing

String parsing vs Database querying


I have a MySQL table with this data:

TempID    TempName       TempString 
1         aaa            34:56:23:45:67:55
2         bbb            12:56:67:45:33:99

I want to perform a calculation which requires splitting TempString into its individual values, such as 34, 56, 23..., for example.

Which is the best way:

  1. Do string parsing in python and using its value
  2. Separate table for storing values with TempID

    ValID     TempID     Value
    1         1          34
    2         1          56
    3         1          23
    4         1          45
    5         1          67
    6         1          55
    7         2          12
    8         2          56
    9         2          67
    

    Query to get the value for each TempID and do calculation

Kindly suggest the best way to do this.


Solution

  • Well, the big question is, is it semantically correct to break those values?

    When designing databases tables, it is necessary to think about domain definition of every column. Every data item stored inside of one column should be atomic (non-decomposable). This means that item stored in a column is an element of the domain defined for a particular column. This can be referred to as a domain integrity constraint (check wikipedia article on data integrity for more info).

    You can use a simple heuristic as a help in making a final decision: do you ever need, or will you ever need, to use part of the stored string for searching, i.e. inside of where clause. If yes, break the string into separate table columns, otherwise just parse it using Python.

    A simple example of breaking the domain integrity is having a table for storing blog posts, where all tags for a single post are joined into a big string and stored into only one row, instead of having a separate table which connects posts to tags.

    An obvious example of when it's okay to store relatively complex value into one column is storing dates. Even though it is composed of multiple values, full date is not complete (atomic) without all the information about year, month, day...