Search code examples
sqlcsvms-accessdata-cleaning

SQL: converting distinct column data in duplicate rows into columns in MS Access


So I'm working with data from csv-files that are a bit chaotic. Lot's of duplicate rows with the last two values constituting what should be a separate header and data in that column, like this:

ID Name Date field4 field5
1 John 21-03-23 Flavour Strawberry
1 John 21-03-23 Location Kitchen
2 Jane 21-04-18 Flavour Lime
2 Jane 21-04-18 Location Basement

I've cleaned data in Access up until now with just a INSTER INTO statement, since all duplicate rows with the same primary key will be overlooked. A query like this:

INSERT INTO b (ID, Name, Date, field4, field5)
SELECT a.ID, a.Name, a.Date, a.field4, a.field5
FROM a;

,which returns something like this:

ID Name Date field4 field5
1 John 21-03-23 Flavour Strawberry
2 Jane 21-04-18 Flavour Lime

However, now I would need all the different values in field4 and field5 to be transformed into new columns, so the data won't be lost. Like this:

ID Name Date Flavour Location
1 John 21-03-23 Strawberry Kitchen
2 Jane 21-04-18 Lime Basement

How can this be done in SQL?


Solution

  • Use conditional aggregation:

    select id, name, date,
           max(iif(field4 = "Flavour", field5, null)) as flavour,
           max(iif(field4 = "Location", field5, null)) as Location
    from t
    group by id, name, date;
    

    This assumes that you know the columns that you want in the result set. If not, you would need to generate the query string in some way to do a more dynamic query.