Search code examples
analyticsformulaformulasalteryx

Why is this Formula for Alteryx returning 0's instead of averages


I was wondering what is wrong with the following formula.

IF [Age] = Null() THEN Average([Age]) ELSE [Age] ENDIF

What I am trying to do "If the cell is blank then fill the cell with the average of all other cells called [Age].

Many thanks all!


Solution

  • We do a lot of imputation to correct null values during our ETL process, and there are really two ways of accomplishing it.

    The First Way: Imputation tool. You can use the "Imputation" tool in the Preparation category. In the tool options, select the fields you wish to impute, click the radio button for "Null" on Incoming Value to Replace, and then click the radio button for "Average" in the Replace With Value section. The advantages of using the tool directly are that it is much less complicated than the other way of doing it. The downsides are 1) if you are attempting to fix a large number of rows relative to machine specs it can be incredibly slow (much slower than the next way), and 2) it occasionally errors when we use it in our process without much explanation.

    The Second Way: Calculate averages and use formulas. You can also use the "Summarize" tool in the Transform category to generate an average field for each column. After generating the averages, use the "Append" tool in the Join category to join them back into the stream. You will have the same average values for each row in your database. At that point, you can use the Formula tool as you attempted in your question. E.g. IF [Age] = Null() THEN [Ave_Age] ELSE [Age] ENDIF

    The second way is significantly faster to run for extremely large datasets (e.g. fixing possible nulls in a few dozen columns over 70 million rows), but is much more time intensive to set up and must be created for each column.