Search code examples
excelexcel-formuladuplicatesdistinctworksheet

Excel formula to find duplicates from multiple sheets and list down on a new sheet or master sheet


I have an excel workbook which stores Employee data. It has 3 sheet which has data coming from 3 different sources. All the 3 sheet has the Employee ID column as common which is ideally the first column.

Now I am looking for a formula which could find Employee ID which are present on all 3 sheets and list down in the master sheet in the same workbook preferably in Column1. I am not looking for Unique or Distinct Employee ID's but one's which are present on all these 3 sheets.

Compulsorily the Employee ID has to be on all the 3 sheets then only should appear in the Master sheet. How can i achieve that?

Sample

Every time I manually copy paste the employee ids in Master sheet and then do a vlookup on all 3 sheet and then i extract the one's which appear on all the 3 sheets and delete the rest. This take sometime. Now I am looking for formula which can do that for me. Compare, search and then list down.


Solution

  • Try using the following formula:

    enter image description here


    • Formula used in cell A1 --> Method 1

    =LET(
         _Stores, TOCOL('*'!A:A,1),
         _Uniq, UNIQUE(_Stores),
         _Counts, MMULT(N(_Uniq=TOROW(_Stores)),SEQUENCE(ROWS(_Stores))^0),
         SORT(FILTER(_Uniq,_Counts=3)))
    

    Or, If applicable can use GROUPBY() --> Exclusively Applicable MS365 Office Insiders Version. Method 2:

    =LET(
         _Stores, TOCOL('*'!A:A,1),
         _Group, GROUPBY(_Stores,_Stores,ROWS,,0,1),
         TAKE(FILTER(_Group, TAKE(_Group,,-1)=3),,1))
    

    Note: When placing the formula in the cell the * in TOCOL() function will automatically change to TOCOL(Sheet1:Sheet3!A:A,1)


    Or, Can use BYROW() function - Method 3:

    =LET(
         _Stores, TOCOL(Sheet1:Sheet3!A:A,1),
         _Uniq, SORT(UNIQUE(_Stores)),
         FILTER(_Uniq, BYROW(_Uniq, LAMBDA(x, SUM(--(x=_Stores))))>=3))
    

    Explanations about the formula logic:

    • The above formulas are dynamic array formulas which uses firstly the LET() function, helps in eliminating redundant calculations, avoids using repeated formulas/ranges and improves working functionalities of Excel by performance.
    • _Stores variable is the source ranges from all the three sheets, in the example it uses TOCOL() function with an optional argument [ignore] --> 1 - ignore blanks as we are taking the whole column A:A from all the sheets, we are excluding the empty cells if any. This stacks one upon the other into a single array.
    • Next, using UNIQUE() function, extracting each unique values.
    • _Counts variable uses an MMULT() function which does a matrix product of two arrays. The output is an array of counts.
    • Finally, using FILTER() function to include those which have counts equal to 3 and lastly using SORT() function to return an A-Z sorted data.

    The algorithm remains same for the other two following formulas provided, the second uses a GROUPBY() function which is Beta/Office Insider Version, one needs to enable it while using MS365 and writing the above, it is bit shorter, and the last method uses a LAMBDA() helper function called BYROW() which does custom logical calculations per row in an array and returns one result per row as an array.

    The basic logic remains same, stack all the data into one, then get the uniques, do the counts and keep only those which are all present in the three stores.


    This can also be accomplished using Power Query, available in Windows Excel 2010+ and Excel 365 (Windows or Mac)

    enter image description here


    To use Power Query follow the steps:

    • First convert the source ranges into a tables and name it accordingly, for this example I have named it as Table1, Table2 and Table3

    • Next, open a blank query from Data Tab --> Get & Transform Data --> Get Data --> From Other Sources --> Blank Query

    • The above lets the Power Query window opens, now from Home Tab --> Advanced Editor --> And paste the following M-Code by removing whatever you see, and press Done

    let
        Source = Excel.CurrentWorkbook(),
        #"Filtered Rows" = Table.SelectRows(Source, each not Text.StartsWith([Name], "Query")),
        #"Expanded Content" = Table.ExpandTableColumn(#"Filtered Rows", "Content", {"Column1"}, {"Column1"}),
        #"Removed Other Columns" = Table.SelectColumns(#"Expanded Content",{"Column1"}),
        #"Grouped Rows" = Table.Group(#"Removed Other Columns", {"Column1"}, {{"Count", each Table.RowCount(_), Int64.Type}}),
        #"Filtered Rows1" = Table.SelectRows(#"Grouped Rows", each [Count] = 3),
        #"Sorted Rows" = Table.Sort(#"Filtered Rows1",{{"Column1", Order.Ascending}}),
        #"Removed Columns" = Table.RemoveColumns(#"Sorted Rows",{"Count"})
    in
        #"Removed Columns"
    

    enter image description here


    • Lastly, to import it back to Excel --> Click on Close & Load or Close & Load To --> The first one which clicked shall create a New Sheet with the required output while the latter will prompt a window asking you where to place the result.

    Notes: The second step in the Power Query, I am only taking the tables from the three sheets and not the table which will be an output in the Master Sheet, hence excluding it or filtering it out. So, when you name the Tables and while pasting the M-Code do exclude output query, otherwise whenever you add new data and refresh the imported table in the Master Sheet it will keep duplicating the desired output.

    #"Filtered Rows" = Table.SelectRows(Source, each not Text.StartsWith([Name], "Query"))
    

    However, in the above formulas if you don't have access to TOCOL() function then you can replace that part with the following, but it has a drawback, since it uses FILTERXML() and TEXTJOIN(), the former will not work on web version of Excel if you happen to use it, and the later has character limitations -->32767 characters (cell limit) which when reached will return #VALUE! error, but with the existing example it works without any issue.


    =FILTERXML("<a><b>"&SUBSTITUTE(TEXTJOIN(",",,Sheet1:Sheet3!A:A),",","</b><b>")&"</b></a>","//b")