I am working on a spotfire tool, and I am using a calculated column in my main data table to group data rows into 'families' through a regex match. For example, one row might have a 'name' of ABC1234xyz, so it would be part of the ABC family because it contains the string 'ABC'. Another rows could be something like AQRST31x2af, and belong to the QRST family. The main point is that the 'family' is decided by matching a substring in the the name, but that substring could be any length, and isn't necessarily the beginning of the name string.
Right now I am doing this by a large nested If statement with a calculated column. However, this is tedious for adding new families, and maintaining the current list of families. What I would like to do is create a table with 2 columns, the string match and the family name. Then, I would like to match from this table to determine family instead of the nested if. So, it might look like the below tables:
Match Table:
id_string | family
----------------------
ABC | ABC
QRST | QRST
SUP | Super
Main Data Table:
name | data | family
---------------------------------------
ABC1234 | 1.02342 | ABC
ABC1215 | 1.23749 | ABC
AQRST31x2af | 1.04231 | QRST
BQRST32x2ac | 1.12312 | QRST
1903xSUP | 1.51231 | Super
1204xSUP | 1.68123 | Super
If you have any suggestions, I would appreciate it.
Thanks.
As @ksp585 mentioned, it doesn't seem like Spotfire can do exactly what I want, so I have come up with a solution using IronPython. Essentially, here is what I have done:
Created a table called FAMILIES, with the columns IDString and Family, which looks like this (using the same example strings above):
IDString | Family
------------------------
ABC | ABC
SUP | Super
QRST | QRST
Created a table called NAMES, as a pivot off of my main data table, with the only column being NAME. This just creates a list of unique names (since the data table has many rows for each name):
NAME
------------------------
ABC1234
ABC1215
AQRST31x2af
BQRST32x2ac
...
Created a Text Area with a button labeled Match Families, which calls an IronPython script. That script reads the NAMES table, and the FAMILIES table, compares each name to the IDString column with a regex, and associates each name with a family from the results. Any names that don't match a single IDString get the family name 'Other'. Then, it generates a new table called NAME_FAMILY_MAP, with the columns NAME and FAMILY.
I can then add families to the FAMILIES table using another script, or by just replacing the FAMILIES table with an updated list. It's slightly more tedious than what I was hoping, but it works, so I'm happy.