I am trying to extract all cell/range addresses appear with in a formula in a Google Sheets cell.
Formulas by nature might be very complicated. I tried many patterns, that works in web testers, but not on google sheets re2.
The following example shows two issues. Maybe I misreading the matching results, but as I understand the are 4 Matches.
Formula (ignore the logic):
=A$13:B4+$BC$12+$DE2+F2:G2
Regex:
((\$?[A-Z]+\$?\d+)(:(\$?[A-Z]+\$?\d+))?)
Expected result:
[A$13:B4,$BC$12,$DE2,F2:G2]
Here (if I am not misreading the results) it looks OK. I am not sure if the group matching displayed are also considered as matches as it is stated "4 matches, 287 steps"
However in google sheets returns all Match 1 results
[A$13:B4,A$13,:B4,B4]
The other matches are ignored So I guess the question is how to convert the regex to re2 syntax?
Update: Following player0 comments, maybe I was not clear. This is only a simple example, to isolate other issues I have. This one is just a string containing addresses in few relative and absolute formats. However, I am looking for a wider general solution that will fit any possible formulas that might contain formulas and references to other sheets. For example:
=(STDEVA(Sheet1!B2:B5)+sum($A$1:$A$2))*B2
Expected results here is Sheet1!B2:B5,$A$1:$A$2,B2
This formula contains two formulas and reference to another sheet. Still ignoring here from Named Ranges and other formula possible references that I am currently can not think of. Also, the square brackets [] are irrelevant, it was just way to display the results, and actually is copied from Logs as it is all done within a script.
I figured out a better way without splitting, by using the /g
flag.
However, this works in a script and not by using Sheets internal regex function (ie REGEXEXTRACT
), as I couldn't figure out how to format the regex string within a cell that will contain the /g flag and REGEXEXTRACT
will accept as a valid regex.
Here's the code:
/* Find all predessesor cells of input range
*/
function findPredecessor(rng){
var formualaText = rng.getFormula();
/* addMatchesRegex
* supports all A1Notation addresses
* the 2nd regex after the | operator will match all column addresses (A:A, Sheet2!b:B, etc)
* some NamedRanges with names like NameRange1
* Does not support - NamedRange with names including dot, not ending with digits
*/
var addMatchesRegex = /(([\w .'!]+)?(\$?[A-Z]+\$?\d+\b)(:(\$?[A-Z]+\$?\d+))?)|([\w .'!]+)?[A-Z]+:[A-Z]+/gi;
var addMatches = formualaText.match(addMatchesRegex);
Logger.log("%s add matched: %s",addMatches.length,addMatches);
/* fullMatchRegex
* modify addMatches to return also strings like
* 1. SUM, IFERROR, etc - internal sheets functions.
* 2. NamedRanges
*
*/
var fullMatchRegex = /(([\w .'!]+)?([\$A-Z.\d]*)(:(\$?[A-Z]+\$?\d*))?)/gi;
// match regex with formula
var fullMatches = formualaText.match(fullMatchRegex);
Logger.log("Full matches list: %s",fullMatches);
var namedRangesAdd = analyzeMatch(addMatches,fullMatches);
Logger.log("%s total predecessors: %s",namedRangesAdd.length,namedRangesAdd);
}
/* This function accepts the two regex matches list
* and returns one unique list of all predecessor addresses
* @param {Array} addMatches - All A1 notation addresses
* plus some of NamedRanges
* @param {Array} fullMatches - All A1 notation addresses,All NamedRanges,
* Other irrelevent matches
*/
function analyzeMatch(addMatches,fullMatches){
/*Expected
First parameter - holds all A1Notation addresses as well as NamedRanges that
their name in the form of /[A-Z]+/d+
NamedRange with name including dot(.) or does not contain digits will not
be on the list
Second Parameter - contains all first list matches, as well as all NamedRanges
names and also irrelevant matches to be filtered like function names and empty string
*/
//Full Matched Addresses to be returned
var mAddresses = [];
//Remove duplicate addresses
var uniqueMatches =
addMatches.filter((item,index)=>addMatches.indexOf(item)===index);
//Get all named Ranges in spread sheet
var nr = SpreadsheetApp.getActive().getNamedRanges();
// Loop Named Ranges arr
nr.forEach(function(item){
/* Check if the name of the current Named Range
* is included in matches
* 1. first in addMatches list
* 2. only if not found in the wider list */
var name = item.getName();
//Check if in addmatches array
var i = uniqueMatches.indexOf(name);
//Build A1Notation address of current NamedRange
var rng = item.getRange();
var add = "'" + rng.getSheet().getName() + "'!" + rng.getA1Notation();
if (i > -1){
//Add the address of curr NamedRange to final list
mAddresses.push(add);
//Remove curr NamedRange from list
uniqueMatches.splice(i,1);
}else if (fullMatches.includes(name)){
// Name found - add the address of the
// Named Range to matched Addresses list
mAddresses.push(add);
}
});
//Add all left matched addresses to final list
mAddresses.push(...uniqueMatches);
return mAddresses;
}
What makes it a bit complicated are NamedRanges. This code will match and analyze and returns one list with all predecessors addresses including the addresses of the NamedRanges.