Search code examples
excelexcel-formulaexcel-lambda

Create custom cell reference/array in formula without enclosing string parameters in quotation marks


For work I created a workbook that references data from another tab, does some filtering and/or calculations with the data, then plots it. The workbook also has form controls for the user to change what site (geographic) the data is from, what year the data is from, and more.

My first version of the workbook was very slow to load the plots due to the number of functions that have to re-run every time the user selects a different option in the form controls. So, instead I've been learning how to use the LAMBDA & related functions to create my own formulas & speed up my workbook.

My current issue is that I cannot figure out how to create my own cell reference/array in my LAMBDA function without the user having to enter quotations around the string input parameters. The answer may be out there, but I was having great difficulty finding it because most of the questions involve VBA and I would like to not have any VBA in this workbook (I need to be able to undo any changes to the sheets, which using VBA overrides).

I would like to recreate my existing formula using LAMBDA and for it to allow the user to select the columns for the array without the user typing quotation marks. My current formula:

=VALUE(IF($C$16=5,INDIRECT("E5:G"&$C$25),FILTER(INDIRECT("E5:G"&$C$25),YEAR(INDIRECT("E5:E"&$C$25))=$C$17,NA())))

Where:

  • Datetime values are in Column E and my data is in Columns F&G
  • C16=5 means the user selected the 5th option in the Year form control, in which 5 = All Years
  • C25 provides the number for the last row of data for columns E-G
  • C17 provides the calendar year

I started with trying to turn just the INDIRECT function into a LAMBDA function knowing it can be nested within a larger LAMDA function later. I tried rewriting this formula using CHAR( ) and TEXTJOIN( ) and double quotations and other methods, but cannot get the result I want. Here is what I have:

=LAMBDA(input_start_col,input_end_col,input_end_row,INDIRECT(input_start_col&"5:"&input_end_col&input_end_row))("E","G",$C$25)

However, I do not want the user having to type the quotations in ("E","G",$C$25) even though I know excel will only correctly read the E and G as strings when in quotation marks. I was wondering if there was a way to add the quotation marks into the LAMBDA/INDIRECT function itself, so the user only needs to type (E,G,$C$25).

The reason I am doing this instead of just using INDIRECT("E5:G"$C$25) and manually switching out the E, G, and C25 is because:

  1. I use this format of INDIRECT in multiple different formulas and would like to make it a named function to speed up my workbook,
  2. This workbook is going to be a template that other people will use, possibly even after I've left this position, and I want it to be as easy for the user as possible. Just let them click a few cells and enter a few letters without having to know the ins and outs of the formula.

Edit:

I have 12 tabs (1 for each "scenario") in my workbook where I copy & pasted the datasets (output data for each scenario from a modeling software). Here is an example of one of those tabs, showing the first 10 rows of data:

Screenshot of excel sheet showing first 10 rows of data. Data spans columns B-E, G-J, and L-O.

The Daily Min Stage data ends at row 1463 and the Hourly Mean Flow & Stage data both end at row 35067.

I have 6 plots I am making with this data, but right now I am focusing on just the Daily Min Stage data to make a time series plot. The user is able to select options in a drop down (form control) to change which 2 scenarios are plotted against each other, the site location, the year (2020, 2021, 2022, 2023, or All), and the operating season (whether to view all data, only the data during the operating season, or only the data outside the operating season). The plot will automatically update to show the data associated with the chosen selections; the functions are written to change where the referenced data comes from based on those form controls.

Here is the tab of data & calculations for the time series plot:

Screenshot of excel sheet showing data for each step of calculations, this data is used to plot the time series chart (not shown). The sheet also shows the background/reference information used in the calculations.

In Step 1 I used LAMBDA & LET to create a named function that uses the IF, INDIRECT, and ADDRESS functions. The function takes user inputs for Scenario # and Tab Name, then depending on which Location it "pulls" the appropriate data from the associated tab & "pastes" it into Columns F & G. So, according to the screenshot the function entered in F5:F1465 knows it needs to find the "Study 2b" data, which is in tab 2b, and it knows to select the Site B data (Column D in tab 2b) and paste it to Column F in my calculations tab.

Step 2 filters the data in Columns E-G by the user selected year(s). Step 3 filters the data in Columns I-K by the user selected operating season option. The data in Columns M-Q are then plotted in my time series chart.

So, that is all to say that:

  • The range of data used to plot the 6 different charts are dynamic.
  • There is 1 tab of calculations for each plot, meaning there are 6 tabs of calculations. Each tab of calculations has at least 3 "steps", most of which have complex functions and all steps after step 1 are also working with dynamic ranges.
  • Every time the user changes one of the form controls to change which year of data is plotted, what site the data is from, etc. all steps for all 6 tabs of calculations must run again before the data is plotted.

The highly dynamic aspect of my worksheet is why I used the INDIRECT function in the first place and is also why I am trying to use LAMBDA to speed up the process of running all of those calculations every time a form control option is changed.

I guess I could make a named range for every single step for each of the 6 calculations tabs instead of using INDIRECT. But I don't really want to make 20+ named ranges (and then future users must add more named ranges themselves if they want to add more plots & calculations tabs) when I can just make 1 named function using INDIRECT (or another function) and let the user select which Columns (AKA which step in a specific tab) to run the function on.


Solution

  • I found a "solution" to my issue. It does not do exactly what I wanted, but it is close enough.

    First, I made a subfunction (so I could use it for any other function I need down the line) that just finds the address (relative cell reference) of whatever cell is input.

    func_ref_cell_relative = LAMBDA(input_cell, ADDRESS(ROW(input_cell), COLUMN(input_cell),4))
    

    Second, I made a subfunction that outputs the letter of the column for the input cell.

    func_col_from_ref = LAMBDA(input_col_letter, SUBSTITUTE(ADDRESS(1, COLUMN(input_col_letter), 4), "1", ""))
    

    Third, I made a function that creates an array from 3 user inputs: the cell where the array starts, the first cell of data in the array's end column, and the last row of data for the array (I have a cell that outputs this number, so the user can just select this cell as the third input).

    eqn_create_array_ref = LAMBDA(array_start_cell, array_end_col, array_end_row, INDIRECT(func_ref_cell_relative(array_start_cell) & ":" & func_col_from_ref(array_end_col) & array_end_row))
    

    So, my data starts in E5 and ends in column G, and in cell C25 it says my data ends on row 1465. The user types out =eqn_create_array_ref(E5,G5,$C$25) and the function evaluates that to be INDIRECT(E5:G1465).