Search code examples
stringdatecomparisonmissing-dataspotfire

Returning only the unique values from two strings in Spotfire


I am trying to compare two columns a + b. Column A is populated with a string of data and Column B with another string of data. I am using the below formula in a custom column to return unique values yet it does not appear to work for instances. Formula being used to identify the unique values:

right(RXReplace([expected dates],[dates],"","g"),Len(RXReplace([expected dates],[dates],"","g")))

[sample of the data being compared

this formula sometimes returns all the dates in the expected dates column when those dates exist in the dates column.

I have tried this formula

 right(RXReplace([expected dates],[dates],"","g"),Len(RXReplace([expected dates],[dates],"","g"))). 

I was expecting the dates which were missing in the dates column to be returned [sample of what was expected to be returned


Solution

  • I am not sure you can get this result with a simple expression, as you are effectively comparing two arrays. I am suggesting to create a calculated column using a TERR expression function, if that is available to you.

    I was also not sure of what you expected in return. Do you want only the elements in [expected dates] that are not in [dates], or do you want all the differences, i.e. including the dates in [dates] that are not in [expected dates] (extra dates)?

    The code I am suggesting is:

    checkDifference = function(A,B) {
        Av=strsplit(A,',')[[1]]
        Bv=strsplit(B,',')[[1]]
        Cv=c(setdiff(Bv,Av),setdiff(Av,Bv))
        C=paste(Cv,collapse=',')
        return (C)
    }
    
    output=mapply(FUN=checkDifference,input1,input2)
    

    The function first splits both columns into vectors, applies a set difference, joins the resulting vector back to a string and returns it. A TERR expression function assumes inputs and outputs are entire columns of data, that is why I used the mapply function to apply the formula row by row.

    In this case I am looking for all differences, if you only want the elements in [expected dates] that are not in [dates] you would replace this line, which concatenates both differences:

    Cv=c(setdiff(Bv,Av),setdiff(Av,Bv)
    

    with this, i.e. only the set of items that are in Bv but not Av:

    Cv=setdiff(Bv,Av)
    

    In order to put this code into Spotfire, you would from the top menu choose Data > Data function properties > Expression functions tab > New..

    Then in the editor copy and paste the code, give it a name e.g. TERR_checkDates, Function type=Column function, Return type=String, Category=Text functions.

    This function will now be available in the list of Functions that can be used when you create a new calculated column. You should find it in the list of available functions.

    Create the new column with this expression:

    TERR_checkDates([dates],[expected dates])