Search code examples
spotfire

Spotfire: DataFunctionError: Output variable was not defined


I want to create a calculated column with a Python data function in Spotfire. This is my script:

import pandas as pd

def calculate_passage(df, batch_col, date_col):
    df['RowIndex'] = df.groupby(batch_col).cumcount() + 1
    df['Passage'] = 0

    for batch, batch_df in df.groupby(batch_col):
        passage = 1
        same_date_count = 0
        prev_date = None

        for index, row in batch_df.iterrows():
            if row['RowIndex'] <= 6:
                passage = 1
            elif row['RowIndex'] <= 10:
                passage = 2
            else:
                if row[date_col] == prev_date:
                    same_date_count += 1
                else:
                    same_date_count = 0

                if same_date_count == 1:
                    passage += 1

            df.loc[index, 'Passage'] = passage
            prev_date = row[date_col]

    # Remove the 'RowIndex' column from the output DataFrame
    df.drop(columns=['RowIndex'], inplace=True)
    passage = df[['Passage']] 

    # Return a DataFrame with a single column 'Passage'
    return passage

My parameters: input parameters

output parameters

And these are my mapping settings if I click 'Run': mapping pars

And still, I get the error "spotfire.data_function.DataFunctionError: Output variable 'passage' was not defined"...


Solution

  • You are defining a function within the data function. You need a main statement to call this internal function (calculate_passage) with the input parameters sent into the data function. In most cases, you won't have to return passage explicitly from the main bit (but obviously, from calculate_passage you do).

    I don't have your data and cannot see your inputs. This simple data function worked for me:

    import pandas as pd
    import numpy as np
    
    def calculate_passage(df):
        df['Passage'] = np.random.randint(0, 99, df.shape[0])
        passage = df[['Passage']] 
        # Return a DataFrame with a single column 'Passage'
        return passage
    
    ### Main
    passage = calculate_passage(df)
    

    The other potential problem is that you are defining passage as a table, not a column. So it will be created as a new distinct data table. If you assign it to a column, provided you have not re-shuffled or reduced the number of rows within the calculate_passage function, it should be added as an extra column to df.