I am in a Data Warehouse environment and have a requirement, that for an existing table(Say table "A") with approximate 450 columns with 100 million records in the table, need to add another 50 columns, However as the table is already huge and wide, and the source system cannot accommodate it, I would be receiving and loading an extended table(Say table "B") with 50 new columns, and a SurrogateKey in both the tables, So the new table is as well going to hold 100 million records. But 51 columns. The SurrogateKey in table B is an Identity column, so it by itself generates the Key.
I need the surrogate keys as it would be used to optimize the reporting layer.
In order to maintain the SurrogateKey in table A, once both the tables are loaded, I would have to run a procedure containing below code to update SK in A :
Update A
SET SK=B.SK
from A JOIN B
ON A.c1=B.c1 AND
A.c2=B.c2 AND
A.c3=B.c3 AND
A.c4=B.c4
Considering that I do not have a PK or any Index on both the tables, what would be the recommendation to add PK or Indexes, so that I get maximum performance when updating table A, as well as joining A and B.
Adding indexes will reduce update performance if you directly updating table A. I recommend create Primary key and required indexes and then use a temporary table to update the records and then you can truncate and load your table A.
In short,
Hopefully, this approach should help. Let me know if this works for you.