So I'm working with data from csv-files that are a bit chaotic. Lot's of duplicate rows with the last two values constituting what should be a separate header and data in that column, like this:
ID | Name | Date | field4 | field5 |
---|---|---|---|---|
1 | John | 21-03-23 | Flavour | Strawberry |
1 | John | 21-03-23 | Location | Kitchen |
2 | Jane | 21-04-18 | Flavour | Lime |
2 | Jane | 21-04-18 | Location | Basement |
I've cleaned data in Access up until now with just a INSTER INTO statement, since all duplicate rows with the same primary key will be overlooked. A query like this:
INSERT INTO b (ID, Name, Date, field4, field5)
SELECT a.ID, a.Name, a.Date, a.field4, a.field5
FROM a;
,which returns something like this:
ID | Name | Date | field4 | field5 |
---|---|---|---|---|
1 | John | 21-03-23 | Flavour | Strawberry |
2 | Jane | 21-04-18 | Flavour | Lime |
However, now I would need all the different values in field4 and field5 to be transformed into new columns, so the data won't be lost. Like this:
ID | Name | Date | Flavour | Location |
---|---|---|---|---|
1 | John | 21-03-23 | Strawberry | Kitchen |
2 | Jane | 21-04-18 | Lime | Basement |
How can this be done in SQL?
Use conditional aggregation:
select id, name, date,
max(iif(field4 = "Flavour", field5, null)) as flavour,
max(iif(field4 = "Location", field5, null)) as Location
from t
group by id, name, date;
This assumes that you know the columns that you want in the result set. If not, you would need to generate the query string in some way to do a more dynamic query.