Search code examples
excelgoogle-sheets

Summarizing duplicate data without losses


I have a file with annualized data about businesses in my area, it is formatted in the way, that for each year, there is a seperate 'sheet' with all the businesses which were active that year. Now I want to create a 'master sheet' of all the data. Each business is represented by the following data:

id latitude longitude year
15214537 14.4248605 50.0786082 2017

I want to somehow import all the data from all seven sub-sheets onto the master sheet, whilst checking for duplicates via the id data point and if one is found I want to be able to tell which sub-sheets they are from. Maybe if it could check the year and then output true/false in an according column?

id latitude longitude 2017 2018 2019 2020
15214537 14.4248605 50.0786082 true true false true

This is the first 'real' programming problem I have ever tried to solve with excel/sheets so I have no idea where to start. Should I first import all the data onto the master sheet and then sort? How do I even sort it? Should I do this in Excel, Google Sheets, or would R be better?

Thanks for any help

Due to my limited programming knowledge I have tried to solve it via Asking ChatGPT which helped me figure most things out before, but now I don't know where to even begin.


Solution

  • Assuming your Google Sheets tabs are named 2017, 2018, 2019 and 2020, choose Insert > Sheet and put this formula in cell A1 of the new sheet:

    =let( 
      stackValues_, lambda(sn,ra,reduce(tocol(ø,2),sn,lambda(rs,s,let(r,indirect(s&"!"&ra),l,max(index(row(r)*iferror(r<>""))),d,filter(r,row(r)<=l),if(counta(iferror(d)),vstack(rs,d),rs))))),
    
      query( 
        stackValues_({ "2017", "2018", "2019", "2020" }, "A2:D"), 
        "select Col1, Col2, Col3, count(Col1) 
         group by Col1, Col2, Col3 
         pivot Col4 
         label Col1 'id', Col2 'latitude', Col3 'longitude' ", 
        0 
      ) 
    )
    

    Sheet 2017:

    id latitude longitude year
    15214537 14.4248605 50.0786082 2017
    15214538 1.110626182 41.82943312 2017

    Sheet 2018:

    id latitude longitude year
    15214537 14.4248605 50.0786082 2018
    15214539 7.323075285 35.85616867 2018

    Results:

    id latitude longitude 2017 2018
    15214537 14.4248605 50.0786082 1 1
    15214538 1.110626182 41.82943312 1
    15214539 7.323075285 35.85616867 1

    See let(), lambda(), reduce() and query().