Overcoming schemas/columns inconsistency across datasets

I've found out that same fields/columns have different business context for certain regions in my dataset.

I have few markets but lets take for sample example - GB and US. Most of those columns have the same meaning, but there are pairs which not, like:

SB1 for US is Strength Evaluation

SB2 for US is Power Evaluation

while

SB1 for GB is Power Evaluation

SB2 for GB is Strength Evaluation

and its case in whole dataset that one country SB1 is Power SB2 strength, for another its reversed and so on. Ideally looking for advice in PySpark, but while I will be working in Databricks, SQL may do as well.

My silver layer looks like that

ID	Market	CK	SB1	SB2	SbX	ColX
1	US	1US	2	1	9	9
2	US	2US	2	2	9	9
3	US	3US	1	1	9	9
1	GB	1GB	3	5	9	9
2	GB	2GB	4	4	9	9
3	GB	3GB	5	3	9	9

What is expected output in that scenario I guess is (look at SB1 SB2 cols)

ID	Market	CK	SB1	SB2	SbX	ColX
1	US	1US	2	1	9	9
2	US	2US	2	2	9	9
3	US	3US	1	1	9	9
1	GB	1GB	5	3	9	9
2	GB	2GB	4	4	9	9
3	GB	3GB	3	5	9	9

Each dataset got over 50 columns, and there are almost 10 markets

Any protips, ideas how to handle that? I guess it can't be solved on ingestion level, so raw and curated zone is not the place to make it happen I guess, the curated dataset has to be transformed and values filled accordingly

Solution

You can surely use standard SQL feature - CASE..WHEN as follows:

select id, market, ck,
       case market when 'US' then SB1
                   when 'GB' then SB2
       end as SB1,
       case market when 'US' then SB2
                   when 'GB' then SB1
       end as SB2,
       sbx, colx
  from your_table

Alternatively, You can also use UNION ALL as follows:

select id, market, ck, sb1, sb2, sbx, colx from your_table where market = 'US'
UNION ALL
select id, market, ck, sb2, sb1, sbx, colx from your_table where market = 'GB'

ID	Market	CK	SB1	SB2	SbX	ColX
1	US	1US	2	1	9	9
2	US	2US	2	2	9	9
3	US	3US	1	1	9	9
1	GB	1GB	3	5	9	9
2	GB	2GB	4	4	9	9
3	GB	3GB	5	3	9	9

ID	Market	CK	SB1	SB2	SbX	ColX
1	US	1US	2	1	9	9
2	US	2US	2	2	9	9
3	US	3US	1	1	9	9
1	GB	1GB	5	3	9	9
2	GB	2GB	4	4	9	9
3	GB	3GB	3	5	9	9

ID	Market	CK	SB1	SB2	SbX	ColX
1	US	1US	2	1	9	9
2	US	2US	2	2	9	9
3	US	3US	1	1	9	9
1	GB	1GB	3	5	9	9
2	GB	2GB	4	4	9	9
3	GB	3GB	5	3	9	9

ID	Market	CK	SB1	SB2	SbX	ColX
1	US	1US	2	1	9	9
2	US	2US	2	2	9	9
3	US	3US	1	1	9	9
1	GB	1GB	5	3	9	9
2	GB	2GB	4	4	9	9
3	GB	3GB	3	5	9	9

ID	Market	CK	SB1	SB2	SbX	ColX
1	US	1US	2	1	9	9
2	US	2US	2	2	9	9
3	US	3US	1	1	9	9
1	GB	1GB	3	5	9	9
2	GB	2GB	4	4	9	9
3	GB	3GB	5	3	9	9

ID	Market	CK	SB1	SB2	SbX	ColX
1	US	1US	2	1	9	9
2	US	2US	2	2	9	9
3	US	3US	1	1	9	9
1	GB	1GB	5	3	9	9
2	GB	2GB	4	4	9	9
3	GB	3GB	3	5	9	9