Search code examples
jsonspotfire

Spotfire and Regex- Extracting Several Values Stored in One Column


I'm needing some help with a RegEx expression that can split some of the values of one column into multiple columns of data. The background on this issue is that this dataset was pulled from a MySQL source that had this particular column stored as a json column, but json columns do not come into Spotfire, so I had to do a cast function just to bring it in.

Let's call this column [Many Values]. Column [Many Values] stores the same information format for all rows, but the data values in it can be different from row to row. As an example, this is what [Many Values] looks like:

[Many Values]

Row 1: {"ValueA":100,"ValueB":50,"ValueC":20,"ValueD":0.09,"ValueE":108.5...}
Row 2: {"ValueA":45,"ValueB":20,"ValueC":76,"ValueD":10.20,"ValueE":null...}
Row 3: {"ValueA":-10,"ValueB":200,"ValueC":40,"ValueD":27.1666,"ValueE":61.23...}

[Many Values] has a lot of information stored in it, but I'd like to have a calculated column (or multiple calculated columns) that will look something like this for Row 1:
[ValueA Extracted]: 100
[ValueB Extracted]: 50
[ValueC Extracted]: 20
...

Is there a good RegEx expression that I can use to pick out what values I need and pair them with the numbers that belong? Or should I consider a different approach? Thanks for the help!


Solution

  • This might want to be automated with some scripting, depending on how many columns there are to be extracted.

    I have taken your 3 rows and calculated the following columns:

    Json2: RXReplace([Json_string],"(\\{|\\})","","g")
    

    Where [Json_string] is the original column with the JSON. This removes the {}.

    Then for each column in the string (there were A B C D and E) I have created a calculated column like:

    Real(RXReplace(Split([Json2],",",1),'"ValueA":','','g'))
    

    This is for column A. For B you use 2 instead of 1, and ValueB instead of ValueA. For C you use 3 and ValueC. Etc.

    The Real() takes automatically care of the nulls.