Search code examples
excelexcel-formulaexcel-2013excel-match

Comparing unique strings of Excel data across worksheets


long time reader, first time pos(t)er of questions.

I have an Excel 2013 worksheet of about 4,000 unique records (rows) of data. We'll call this the data dump. I've filtered the data dump using any one of about six different data elements (columns). After each filter I save the results to a new worksheet. I clear the filter to start over, and ultimately wound up with about six different worksheets.

I need to be able to account for each unique record in the data dump--each one should (in theory) appear on at least one of the filtered worksheets, and I need to identify any that don't.

My big problem is that the only way to uniquely identify each record is by concatenating a text string out of five consecutive cells in each row. I cannot add a column of concatenated text to these worksheets (for which reasons I'll presently spare you), so essentially I'm trying to build a formula that says the following:

For a given, unique, concatenated string of text of five consecutive cells from one record on this data dump worksheet, identify any exact matching strings from any of the other worksheets and return TRUE if found or FALSE if not.

I will, of course, have to apply this formula to every record in the data dump.

Thoughts or tips? Ultimately I think it comes down to a lot of small moving parts that I could manage individually, but that I'm not confident I could manage collectively.

Any help is appreciated and I'll be happy to clarify where needed. And forgiveness if a similar question has been asked previously--I searched pretty fruitlessly for an answer all afternoon.

Thank you!


Solution

  • You could use Index to create a concatenated range that serves a lookup range to Match(). Match() can concatenate the lookup term. It then returns a number for a match or an error if no match is found. Wrap error trapping formulas around this for the TRUE/FALSE result. Along the lines of

    =iferror(match(sheet1!A1&sheet1!B1&sheet1!C1&sheet1!d1&sheet1!e1,index(Sheet2!$a$1:$a$1000&Sheet2!$b$1:$b$1000&Sheet2!$c$1:$c$1000&Sheet2!$d$1:$d$1000&Sheet2!$e$1:$e$1000,0),0),FALSE)

    Note that any match will return a number (which will evaluate to a boolean TRUE in summarising formulas) and a non-match will return a FALSE.

    This will get you the row number of the match for the first row of original data on sheet1, where the first extract lives on Sheet2 in the first 1000 rows. Use the same principle for the other four sheets and wrap the five formulas into an OR() statement to arrive at a final TRUE or FALSE.

    Note that the Index ranges should not encompass whole columns, but only the rows with data. Otherwise the formula will be very slow to recalculate, especially if you use it 4000 times.